/plushcap/analysis/cleanlab/cleanlab-multilabel

Automatic Error Detection for Image/Text Tagging and Multi-label Datasets

What's this blog post about?

Image/document tagging is an important instance of multi-label classification tasks, where each example can belong to multiple classes. However, these datasets often contain many label errors that harm the performance of machine learning (ML) models. Researchers have developed algorithms to detect incorrect annotations in any multi-label classification dataset using the open-source cleanlab package. These algorithms are model-agnostic and can be used with any existing or future ML model to efficiently find and fix errors in its training set, test set for benchmarking, reduce the number of annotations needed, and perform other data-centric tasks. The EMA label quality score is a robust method for producing a label quality score for each example in a dataset by computing an exponential moving average over the model's self-confidences for every tag/annotation given to the example. Cleanlab's multi-label algorithms have been benchmarked against nine other approaches, demonstrating their effectiveness in detecting mislabeled examples with any error in their annotation and those which are severely mislabeled.

Company
Cleanlab

Date published
Nov. 29, 2022

Author(s)
Aditya Thyagarajan, ElĂ­as Snorrason, Curtis Northcutt, Jonas Mueller

Word count
1434

Language
English

Hacker News points
1


By Matt Makai. 2021-2024.