Review – TOXIGEN & Knowledge Distillation Meets Open-Set Semi-Supervised Learning

Post Details

Company

AssemblyAI

Date Published

June 16, 2022

Author

Domenic Donato, Dillon Pulliam

Word Count

437

Language

English

Hacker News Points

-

Source URL

www.assemblyai.com/blog/review-toxigen-knowledge-distillation-meets-open-set-semi-supervised-learning

Summary

The paper "TOXIGEN: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection" presents the creation of a large machine-generated dataset containing 274k toxic and benign statements, making it the largest hate speech detection dataset to date. The authors demonstrate that this dataset can improve fine-tuning performance when used alongside other implicit toxic datasets. Additionally, the paper "Knowledge Distillation Meets Open-Set Semi-Supervised Learning" explores how Knowledge Distillation methods can compress powerful Deep Learning models by using student's representations to learn from teacher's outputs and improve generalization on unseen data. Both papers contribute valuable insights for training Content Moderation models and improving the efficiency of Deep Learning models through knowledge distillation.