Minions: embracing small LMs, shifting compute on-device, and cutting cloud costs in the process

Company

Together AI

Date Published

Feb. 25, 2025

Author

Avanika Narayan*, Dan Biderman*, Sabri Eyuboglu*, Avner May, Scott Linderman, James Zou, Christopher Ré

Word count

1257

Language

English

Hacker News points

None

URL

www.together.ai/blog/minions

Summary

Minions, a method that collaborates between small on-device models and frontier cloud models, reduces cloud costs while maintaining performance. Small LMs are improving rapidly and can tackle real tasks, but they struggle with long contexts and multi-step instructions. The Minion protocol addresses these weaknesses by decomposing tasks into smaller subtasks, executing them in parallel on device, and aggregating outputs from the cloud. This approach delivers 97.9% of remote-only solution accuracy at a cost of just 17.5%. By leveraging hardware utilization, sequential communication, and model choice, Minions enables a cost-effective and efficient way to distribute AI workloads between small devices and cloud APIs.