Company
Date Published
Author
Michelle Chen, Jesse Kipp, Nikhil Kothari
Word count
914
Language
English
Hacker News points
None

Summary

Cloudflare has partnered with Meta to make the latest and most powerful model, Llama 4, available on their Cloudflare Workers AI platform. Llama 4 is an industry-leading release that combines a Mixture of Experts architecture with an early-fusion backbone, making it natively multimodal. The model consists of two components: Llama 4 Scout and Llama 4 Maverick, with the former being available on Workers AI today. Llama 4 Scout has a context window of up to 10 million tokens, which is one of the largest available in an open-source model. Despite having a large number of parameters, the MoE architecture can intelligently use only a fraction of them during active inference, delivering faster responses. The Mixture of Experts model works by having individual specialized neural networks called "experts" that work together to provide deeper results and faster inference times. Cloudflare Workers AI provides an efficient way to host Llama 4 models without worrying about infrastructure or hardware requirements.