Cleanlab

Founded in 2021. Privately Held.

External links: homepage | docs | blog | jobs | youtube | twitter | github | linkedin

Data quality issue identification and remediation.

Blog content published by word count

Switch to post count

Blog content

post title author published words HN
CleanVision: Audit your Image Data for better Computer Vision Sanjana Garg, Ulyana Tkachenko, Yiming Chen, Elías Snorrason, Jonas Mueller Mar. 22, 2023 1729 4
Assessing the Quality of Synthetic Data with Cleanlab Studio Elías Snorrason Jul. 12, 2023 2176 2
Overcoming Hallucinations with the Trustworthy Language Model Anish Athalye, Jonas Mueller, Curtis Northcutt, Hui Wen Goh, Ulyana Tkachenko Apr. 25, 2024 4782 2
Letter from the CEO: Announcing our Series A and Cleanlab's Trustworthy Language Model Curtis Northcutt Oct. 10, 2023 742 -
Detecting Dataset Drift and Non-IID Sampling: A k-Nearest Neighbors approach that works for Image/Text/Audio/Numeric Data Jesse Cummings, Elías Snorrason, Jonas Mueller May. 30, 2023 2203 4
Detecting Label Errors in Entity Recognition Data Wei-Chen (Eric) Wang, Elías Snorrason, Jonas Mueller Oct. 12, 2022 1066 -
Effectively Annotate Text Data for Transformers via Active Learning + Re-labeling Chris Mauck May. 22, 2023 1802 -
Training Transformer Networks in Scikit-Learn?! Hui Wen Goh Mar. 08, 2023 1677 4
Improving any OpenAI Language Model by Systematically Improving its Data Chris Mauck, Jonas Mueller Jun. 01, 2023 1898 -
Ensuring Reliable Few-Shot Prompt Selection for LLMs Chris Mauck, Jonas Mueller Aug. 15, 2023 1678 3
How To Train and Deploy Reliable Models on Messy Real-World Data With a Few Clicks Hui Wen Goh, Jonas Mueller, Anish Athalye Jul. 24, 2023 1518 5
Detecting Annotation Errors in Semantic Segmentation Data Vedang Lad, Jonas Mueller Nov. 02, 2023 845 1
cleanlab 2.1 adds Multi-Annotator Analysis and Outlier Detection: toward a broad framework for Data-Centric AI Curtis Northcutt, Jonas Mueller Sep. 21, 2022 974 -
Comparing tools for Data Science, Data Quality, Data Annotation, and AI/ML Jonas Mueller Feb. 09, 2024 1916 -
Automatically Detect Problematic Content in any Text Dataset Hui Wen Goh Dec. 19, 2023 1220 -
Announcing Auto-Labeling Agent: Your Assistant for Rapid and High Quality Labeling Emily Barry Jul. 17, 2024 776 -
Finding Label Issues in Image Classification Datasets Wei Jing Lok, Jonas Mueller Apr. 21, 2022 1696 -
The Stanford Cars Dataset aka Cars196 (cited in 1000+ papers) contains many Fine-Grained Errors Chris Mauck May. 24, 2023 592 -
Reduce Legal Discovery Work by 10x with AI that Curates Documents and Fixes Errors Chris Mauck Aug. 03, 2023 1356 2
Whisking Away Errors: How Cleanlab Studio Served Up Fixes for the Food-101N Computer Vision Dataset Chris Mauck Sep. 11, 2023 546 -
cleanlab 2.3 adds support for Active Learning, Tensorflow/Keras models made sklearn-compatible, and highly scalable Label Error Detection Jonas Mueller Mar. 01, 2023 1045 -
How to detect bad data in your instruction tuning dataset (for better LLM fine-tuning) Jimming He, Sanjana Garg, Jonas Mueller Feb. 07, 2024 2278 -
Use Cleanlab to Improve LLMs: Find Errors in Human Feedback in the Anthropic RLHF Dataset Chris Mauck, Jonas Mueller Apr. 11, 2023 351 -
An open-source platform to catch all sorts of issues in all sorts of datasets Elías Snorrason, Jonas Mueller Feb. 21, 2024 1082 -
ActiveLab: Active Learning with Data Re-Labeling Hui Wen Goh, Jonas Mueller Mar. 02, 2023 1720 4
Enhancing Product Analytics and E-commerce with Data-Centric AI Sanjana Garg Jul. 06, 2023 1484 2
The Fashion MNIST Dataset (cited in 2,200+ papers) contains Hundreds of Miscategorized Items Ganesh Tata, Chris Mauck Jun. 09, 2023 446 -
Don’t Let Your Messy Documents Run You RAG-Ged. Announcing Document Curation in Cleanlab Studio Emily Barry Jun. 07, 2024 311 -
Automated Correction of Satellite Imagery Data Chris Mauck, Aditya Thyagarajan Sep. 20, 2023 673 2
Ensure high-quality data quickly via AI validation of which data is Well Labeled Ulyana Tkachenko, Jonas Mueller Aug. 28, 2023 1544 -
Letter from the CEO: Announcing Our Seed Funding and the Launch of Cleanlab Studio for Enterprise Curtis Northcutt Jul. 20, 2023 1074 -
Detecting Errors in Numerical Data via any Regression Model Jonas Mueller, Mayank Kumar, Hui Wen Goh, Hang Zhou Sep. 18, 2023 1108 2
Accelerate Time Series Modeling with Cleanlab Studio AutoML: Train and Deploy in Minutes Matt Turk Jul. 11, 2024 2053 -
The Office-Home Dataset (cited by 600+ papers) contains hundreds of incorrect labels and outliers. Chris Mauck, Jonas Mueller Apr. 21, 2023 478 -
Datalab: A Linter for ML Datasets Elías Snorrason, Sanjana Garg, Hui Wen Goh, Jesse Cummings, Jonas Mueller May. 16, 2023 1879 2
Finding Label Issues in Audio Classification Datasets Johnson Kuan, Jonas Mueller, Anish Athalye Apr. 27, 2022 2173 -
Automatically Find and Fix Issues in Image/Document Tags and other Multi-Label Datasets Chris Mauck, Ulyana Tkachenko Oct. 17, 2023 990 2
Most AI & Analytics are impaired by data issues. Now AI can help you fix them. Jonas Mueller, Curtis Northcutt, Anish Athalye Jul. 31, 2023 1948 1
How we built Cleanlab Vizzy Caleb Chiam, Luke Mainwaring, Yiming Chen Aug. 17, 2022 2388 -
cleanlab now supports all major ML tasks — including Regression, Object Detection, and Image Segmentation Chris Mauck, Curtis Northcutt, Jonas Mueller Sep. 14, 2023 1200 -
Automated Quality Assurance for Object Detection Datasets Ulyana Tkachenko, Aditya Thyagarajan, Jonas Mueller Sep. 26, 2023 1370 1
Handling Label Errors in Text Classification Datasets Wei Jing Lok, Jonas Mueller, Hui Wen Goh May. 10, 2022 3490 -
How to Filter Unsafe and Low-Quality Images from any Dataset: A Product Catalog Case Study Sanjana Garg, Jonas Mueller Jan. 22, 2024 1505 -
How to Generate Better Synthetic Image Datasets with Stable Diffusion Elías Snorrason, Jonas Mueller Oct. 05, 2023 2071 1
CROWDLAB: Simple and effective algorithms to handle data labeled by multiple annotators Hui Wen Goh, Ulyana Tkachenko, Jonas Mueller Oct. 05, 2022 1320 2
Cleanlab: The History, Present, and Future Curtis Northcutt(Co-Founder & CEO), (Co-Founder & CEO) Apr. 01, 2022 1849 -
cleanlab 2.0: Automatically Find Errors in ML Datasets Curtis Northcutt, Jonas Mueller, Anish Athalye Apr. 21, 2022 841 2
Automated Data Quality at Scale Anish Athalye, Angela Liu Jul. 27, 2023 1155 1
Automatic Error Detection for Image/Text Tagging and Multi-label Datasets Aditya Thyagarajan, Elías Snorrason, Curtis Northcutt, Jonas Mueller Nov. 29, 2022 1434 1
Out-of-Distribution Detection via Embeddings or Predictions Ulyana Tkachenko, Jonas Mueller Oct. 19, 2022 1264 -
Improving Legal Judgement Prediction with Data-Centric AI Hui Wen Goh Jun. 27, 2023 1658 -
A Simple Adjustment Improves Out-of-Distribution Detection for Any Classifier Ulyana Tkachenko, Jonas Mueller, Curtis Northcutt Oct. 19, 2022 1523 -
Handling Mislabeled Tabular Data to Improve Your XGBoost Model Chris Mauck Feb. 06, 2023 1877 2
Beware of Unreliable Data in Model Evaluation: A LLM Prompt Selection case study with Flan-T5 Chris Mauck, Jonas Mueller Jun. 29, 2023 1366 66

By Matt Makai. 2021-2024.