Review – TOXIGEN & Knowledge Distillation Meets Open-Set Semi-Supervised Learning
The paper "TOXIGEN: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection" presents the creation of a large machine-generated dataset containing 274k toxic and benign statements, making it the largest hate speech detection dataset to date. The authors demonstrate that this dataset can improve fine-tuning performance when used alongside other implicit toxic datasets. Additionally, the paper "Knowledge Distillation Meets Open-Set Semi-Supervised Learning" explores how Knowledge Distillation methods can compress powerful Deep Learning models by using student's representations to learn from teacher's outputs and improve generalization on unseen data. Both papers contribute valuable insights for training Content Moderation models and improving the efficiency of Deep Learning models through knowledge distillation.
Company
AssemblyAI
Date published
June 16, 2022
Author(s)
Domenic Donato, Dillon Pulliam
Word count
437
Hacker News points
None found.
Language
English