BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models

What's this blog post about?

The paper presents a novel approach to parameter efficient fine-tuning called BitFit, which focuses on using as few parameters as possible while maintaining high accuracy. The method involves freezing all parameters except the bias terms in the transformer encoder during fine-tuning. Surprisingly, this technique achieves results comparable to full fine-tuning on GLUE benchmark tasks with only 0.08% of the total parameters. BitFit is particularly useful for small to medium size datasets and can sometimes outperform full fine-tuning. The authors also explore using even fewer parameters, such as only the bias of the query vector and second MLP layer, which still performs well but not as effectively as BitFit. Overall, this approach opens up possibilities for easier deployment and memory efficiency by allowing one model to be reused across multiple tasks.

Company
AssemblyAI

Date published
Feb. 25, 2022

Author(s)
Taufiquzzaman Peyash

Word count
311

Language
English

Hacker News points
None found.