Announcing Serverless Multi-LoRA: Fine-tune and deploy hundreds of adapters for model customization at scale

Company

Together AI

Date Published

Dec. 18, 2024

Author

Together AI

Word count

1224

Language

English

Hacker News points

None

URL

www.together.ai/blog/serverless-multi-lora-fine-tune-and-deploy-hundreds-of-adapters-for-model-customization-at-scale

Summary

Serverless LoRA inference with pay-per-token pricing allows users to upload their own LoRA adapters and run inference on them alongside a compatible serverless model, including popular models like Llama 3.1 and Qwen 2.5. The platform enables dynamic adapter switching at scale, running hundreds of models for the same price as a single base model. This allows for cost-efficient model customization, faster iteration and experimentation, optimized performance at scale, and easy fine-tuning of custom LoRA adapters with the Together Fine-tuning API.