Deep Learning Paper Recap - Language Models
The paper "Prune Once For All: Sparse Pre-Trained Language Models" introduces an architecture-agnostic method of training sparse pre-trained language models, allowing for pruning only during the pre-training phase. This technique results in better compression-to-accuracy ratios and eliminates the need to reconsider the model's architecture or task when applying pruning techniques during fine-tuning. The best scores were achieved with 85% and 90% weight pruning, while Quantized Aware Training (QAT) with 85% pruning led to an even more accurate and smaller model.
Company
AssemblyAI
Date published
July 7, 2022
Author(s)
Taufiquzzaman Peyash
Word count
273
Language
English
Hacker News points
None found.