TEAL: Training-Free Activation Sparsity in Large Language Models

Company

Together AI

Date Published

Aug. 28, 2024

Author

James Liu, Pragaash Ponnusamy, Tianle Cai, Han Guo, Yoon Kim, Ben Athiwaratkun

Word count

1056

Language

English

Hacker News points

None

URL

www.together.ai/blog/teal-training-free-activation-sparsity-in-large-language-models

Summary

TEAL (Training-Free Activation Sparsity in Large Language Models) presents a simple training-free approach to activation sparsification, achieving 40-50% model-wide activation sparsity with minimal degradation. This allows for significant speedups in inference, particularly in single-batch decoding, with improvements ranging from 1.53x to 1.8x wall-clock speedups. TEAL targets the entire model, including tensors not previously sparsified, and outperforms existing methods like CATS by optimizing sparsity levels at the transformer block level. Additionally, TEAL demonstrates compatibility with quantization techniques, offering a promising direction for efficient LLM inference. The approach is designed to be flexible and adaptable to various applications, particularly in resource-constrained edge settings.