/plushcap/analysis/cleanlab/cleanlab-cleanlab-2

cleanlab 2.0: Automatically Find Errors in ML Datasets

What's this blog post about?

Cleanlab is an open-source framework for machine learning and analytics with messy, real-world data. It identifies errors in datasets, measures dataset quality, trains reliable models with noisy data, and helps curate high-quality datasets. The tool automates several workflows to help users practice more data-centric AI. Cleanlab significantly reduces the pain of data cleaning by automatically flagging only the small subset of data that truly requires attention. It supports workflows for machine learning and analytics with messy real-world data, finds and fixes example-level, class-level, and dataset-level issues; measures and tracks overall dataset quality; and provides cleaned data for machine learning pipelines.

Company
Cleanlab

Date published
April 21, 2022

Author(s)
Curtis Northcutt, Jonas Mueller, Anish Athalye

Word count
841

Language
English

Hacker News points
2


By Matt Makai. 2021-2024.