/plushcap/analysis/together-ai/together-ai-serverless-multi-lora-fine-tune-and-deploy-hundreds-of-adapters-for-model-customization-at-scale

Announcing Serverless Multi-LoRA: Fine-tune and deploy hundreds of adapters for model customization at scale

What's this blog post about?

Serverless LoRA inference with pay-per-token pricing allows users to upload their own LoRA adapters and run inference on them alongside a compatible serverless model, including popular models like Llama 3.1 and Qwen 2.5. The platform enables dynamic adapter switching at scale, running hundreds of models for the same price as a single base model. This allows for cost-efficient model customization, faster iteration and experimentation, optimized performance at scale, and easy fine-tuning of custom LoRA adapters with the Together Fine-tuning API.

Company
Together AI

Date published
Dec. 18, 2024

Author(s)
Together AI

Word count
1224

Language
English

Hacker News points
None found.


By Matt Makai. 2021-2024.