Cleanlab Blog - Plushcap

54 blog posts published by month since the start of 2021. Start from a different year: 2021
2022
2023
2024
2025

Blog URL

Posts year-to-date

0 (5 posts by this month last year.)

Average posts per month since 2021

0.9

Post details (2021 to today)

Title	Author	Date	Word count	HN points
CleanVision: Audit your Image Data for better Computer Vision	Sanjana Garg, Ulyana Tkachenko, Yiming Chen, Elías Snorrason, Jonas Mueller	Mar 22, 2023	1729	4
Assessing the Quality of Synthetic Data with Cleanlab Studio	Elías Snorrason	Jul 12, 2023	2176	2
Overcoming Hallucinations with the Trustworthy Language Model	Anish Athalye, Jonas Mueller, Curtis Northcutt, Hui Wen Goh, Ulyana Tkachenko	Apr 25, 2024	4782	2
Letter from the CEO: Announcing our Series A and Cleanlab's Trustworthy Language Model	Curtis Northcutt	Oct 10, 2023	742	-
Detecting Dataset Drift and Non-IID Sampling: A k-Nearest Neighbors approach that works for Image/Text/Audio/Numeric Data	Jesse Cummings, Elías Snorrason, Jonas Mueller	May 30, 2023	2203	4
Detecting Label Errors in Entity Recognition Data	Wei-Chen (Eric) Wang, Elías Snorrason, Jonas Mueller	Oct 12, 2022	1066	-
Effectively Annotate Text Data for Transformers via Active Learning + Re-labeling	Chris Mauck	May 22, 2023	1802	-
Training Transformer Networks in Scikit-Learn?!	Hui Wen Goh	Mar 08, 2023	1677	4
Improving any OpenAI Language Model by Systematically Improving its Data	Chris Mauck, Jonas Mueller	Jun 01, 2023	1898	-
Ensuring Reliable Few-Shot Prompt Selection for LLMs	Chris Mauck, Jonas Mueller	Aug 15, 2023	1678	3
How To Train and Deploy Reliable Models on Messy Real-World Data With a Few Clicks	Hui Wen Goh, Jonas Mueller, Anish Athalye	Jul 24, 2023	1518	5
Detecting Annotation Errors in Semantic Segmentation Data	Vedang Lad, Jonas Mueller	Nov 02, 2023	845	1
cleanlab 2.1 adds Multi-Annotator Analysis and Outlier Detection: toward a broad framework for Data-Centric AI	Curtis Northcutt, Jonas Mueller	Sep 21, 2022	974	-
Comparing tools for Data Science, Data Quality, Data Annotation, and AI/ML	Jonas Mueller	Feb 09, 2024	1916	-
Automatically Detect Problematic Content in any Text Dataset	Hui Wen Goh	Dec 19, 2023	1220	-
Announcing Auto-Labeling Agent: Your Assistant for Rapid and High Quality Labeling	Emily Barry	Jul 17, 2024	776	-
Finding Label Issues in Image Classification Datasets	Wei Jing Lok, Jonas Mueller	Apr 21, 2022	1696	-
The Stanford Cars Dataset aka Cars196 (cited in 1000+ papers) contains many Fine-Grained Errors	Chris Mauck	May 24, 2023	592	-
Reduce Legal Discovery Work by 10x with AI that Curates Documents and Fixes Errors	Chris Mauck	Aug 03, 2023	1356	2
Whisking Away Errors: How Cleanlab Studio Served Up Fixes for the Food-101N Computer Vision Dataset	Chris Mauck	Sep 11, 2023	546	-
cleanlab 2.3 adds support for Active Learning, Tensorflow/Keras models made sklearn-compatible, and highly scalable Label Error Detection	Jonas Mueller	Mar 01, 2023	1045	-
How to detect bad data in your instruction tuning dataset (for better LLM fine-tuning)	Jimming He, Sanjana Garg, Jonas Mueller	Feb 07, 2024	2278	-
Use Cleanlab to Improve LLMs: Find Errors in Human Feedback in the Anthropic RLHF Dataset	Chris Mauck, Jonas Mueller	Apr 11, 2023	351	-
An open-source platform to catch all sorts of issues in all sorts of datasets	Elías Snorrason, Jonas Mueller	Feb 21, 2024	1082	-
ActiveLab: Active Learning with Data Re-Labeling	Hui Wen Goh, Jonas Mueller	Mar 02, 2023	1720	4
Enhancing Product Analytics and E-commerce with Data-Centric AI	Sanjana Garg	Jul 06, 2023	1484	2
The Fashion MNIST Dataset (cited in 2,200+ papers) contains Hundreds of Miscategorized Items	Ganesh Tata, Chris Mauck	Jun 09, 2023	446	-
Don’t Let Your Messy Documents Run You RAG-Ged. Announcing Document Curation in Cleanlab Studio	Emily Barry	Jun 07, 2024	311	-
Automated Correction of Satellite Imagery Data	Chris Mauck, Aditya Thyagarajan	Sep 20, 2023	673	2
Ensure high-quality data quickly via AI validation of which data is Well Labeled	Ulyana Tkachenko, Jonas Mueller	Aug 28, 2023	1544	-
Letter from the CEO: Announcing Our Seed Funding and the Launch of Cleanlab Studio for Enterprise	Curtis Northcutt	Jul 20, 2023	1074	-
Detecting Errors in Numerical Data via any Regression Model	Jonas Mueller, Mayank Kumar, Hui Wen Goh, Hang Zhou	Sep 18, 2023	1108	2
Accelerate Time Series Modeling with Cleanlab Studio AutoML: Train and Deploy in Minutes	Matt Turk	Jul 11, 2024	2053	-
The Office-Home Dataset (cited by 600+ papers) contains hundreds of incorrect labels and outliers.	Chris Mauck, Jonas Mueller	Apr 21, 2023	478	-
Datalab: A Linter for ML Datasets	Elías Snorrason, Sanjana Garg, Hui Wen Goh, Jesse Cummings, Jonas Mueller	May 16, 2023	1879	2
Finding Label Issues in Audio Classification Datasets	Johnson Kuan, Jonas Mueller, Anish Athalye	Apr 27, 2022	2173	-
Automatically Find and Fix Issues in Image/Document Tags and other Multi-Label Datasets	Chris Mauck, Ulyana Tkachenko	Oct 17, 2023	990	2
Most AI & Analytics are impaired by data issues. Now AI can help you fix them.	Jonas Mueller, Curtis Northcutt, Anish Athalye	Jul 31, 2023	1948	1
How we built Cleanlab Vizzy	Caleb Chiam, Luke Mainwaring, Yiming Chen	Aug 17, 2022	2388	-
cleanlab now supports all major ML tasks — including Regression, Object Detection, and Image Segmentation	Chris Mauck, Curtis Northcutt, Jonas Mueller	Sep 14, 2023	1200	-
Automated Quality Assurance for Object Detection Datasets	Ulyana Tkachenko, Aditya Thyagarajan, Jonas Mueller	Sep 26, 2023	1370	1
Handling Label Errors in Text Classification Datasets	Wei Jing Lok, Jonas Mueller, Hui Wen Goh	May 10, 2022	3490	-
How to Filter Unsafe and Low-Quality Images from any Dataset: A Product Catalog Case Study	Sanjana Garg, Jonas Mueller	Jan 22, 2024	1505	-
How to Generate Better Synthetic Image Datasets with Stable Diffusion	Elías Snorrason, Jonas Mueller	Oct 05, 2023	2071	1
CROWDLAB: Simple and effective algorithms to handle data labeled by multiple annotators	Hui Wen Goh, Ulyana Tkachenko, Jonas Mueller	Oct 05, 2022	1320	2
Cleanlab: The History, Present, and Future	Curtis Northcutt(Co-Founder & CEO), (Co-Founder & CEO)	Apr 01, 2022	1849	-
cleanlab 2.0: Automatically Find Errors in ML Datasets	Curtis Northcutt, Jonas Mueller, Anish Athalye	Apr 21, 2022	841	2
Automated Data Quality at Scale	Anish Athalye, Angela Liu	Jul 27, 2023	1155	1
Automatic Error Detection for Image/Text Tagging and Multi-label Datasets	Aditya Thyagarajan, Elías Snorrason, Curtis Northcutt, Jonas Mueller	Nov 29, 2022	1434	1
Out-of-Distribution Detection via Embeddings or Predictions	Ulyana Tkachenko, Jonas Mueller	Oct 19, 2022	1264	-
Improving Legal Judgement Prediction with Data-Centric AI	Hui Wen Goh	Jun 27, 2023	1658	-
A Simple Adjustment Improves Out-of-Distribution Detection for Any Classifier	Ulyana Tkachenko, Jonas Mueller, Curtis Northcutt	Oct 19, 2022	1523	-
Handling Mislabeled Tabular Data to Improve Your XGBoost Model	Chris Mauck	Feb 06, 2023	1877	2
Beware of Unreliable Data in Model Evaluation: A LLM Prompt Selection case study with Flan-T5	Chris Mauck, Jonas Mueller	Jun 29, 2023	1366	66

Cleanlab blog content

54 blog posts published by month since the start of 2021. Start from a different year: 20212022202320242025

Post details (2021 to today)

54 blog posts published by month since the start of 2021. Start from a different year: 2021
2022
2023
2024
2025