Minions, a method that collaborates between small on-device models and frontier cloud models, reduces cloud costs while maintaining performance. Small LMs are improving rapidly and can tackle real tasks, but they struggle with long contexts and multi-step instructions. The Minion protocol addresses these weaknesses by decomposing tasks into smaller subtasks, executing them in parallel on device, and aggregating outputs from the cloud. This approach delivers 97.9% of remote-only solution accuracy at a cost of just 17.5%. By leveraging hardware utilization, sequential communication, and model choice, Minions enables a cost-effective and efficient way to distribute AI workloads between small devices and cloud APIs.