Announcing Serverless Multi-LoRA: Fine-tune and deploy hundreds of adapters for model customization at scale
Serverless LoRA inference with pay-per-token pricing allows users to upload their own LoRA adapters and run inference on them alongside a compatible serverless model, including popular models like Llama 3.1 and Qwen 2.5. The platform enables dynamic adapter switching at scale, running hundreds of models for the same price as a single base model. This allows for cost-efficient model customization, faster iteration and experimentation, optimized performance at scale, and easy fine-tuning of custom LoRA adapters with the Together Fine-tuning API.
Company
Together AI
Date published
Dec. 18, 2024
Author(s)
Together AI
Word count
1224
Language
English
Hacker News points
None found.