TEAL (Training-Free Activation Sparsity in Large Language Models) presents a simple training-free approach to activation sparsification, achieving 40-50% model-wide activation sparsity with minimal degradation. This allows for significant speedups in inference, particularly in single-batch decoding, with improvements ranging from 1.53x to 1.8x wall-clock speedups. TEAL targets the entire model, including tensors not previously sparsified, and outperforms existing methods like CATS by optimizing sparsity levels at the transformer block level. Additionally, TEAL demonstrates compatibility with quantization techniques, offering a promising direction for efficient LLM inference. The approach is designed to be flexible and adaptable to various applications, particularly in resource-constrained edge settings.