cleanlab 2.0: Automatically Find Errors in ML Datasets
Cleanlab is an open-source framework for machine learning and analytics with messy, real-world data. It identifies errors in datasets, measures dataset quality, trains reliable models with noisy data, and helps curate high-quality datasets. The tool automates several workflows to help users practice more data-centric AI. Cleanlab significantly reduces the pain of data cleaning by automatically flagging only the small subset of data that truly requires attention. It supports workflows for machine learning and analytics with messy real-world data, finds and fixes example-level, class-level, and dataset-level issues; measures and tracks overall dataset quality; and provides cleaned data for machine learning pipelines.
Company
Cleanlab
Date published
April 21, 2022
Author(s)
Curtis Northcutt, Jonas Mueller, Anish Athalye
Word count
841
Language
English
Hacker News points
2