Company
Date Published
Dec. 18, 2024
Author
Together AI
Word count
1224
Language
English
Hacker News points
None

Summary

Serverless LoRA inference with pay-per-token pricing allows users to upload their own LoRA adapters and run inference on them alongside a compatible serverless model, including popular models like Llama 3.1 and Qwen 2.5. The platform enables dynamic adapter switching at scale, running hundreds of models for the same price as a single base model. This allows for cost-efficient model customization, faster iteration and experimentation, optimized performance at scale, and easy fine-tuning of custom LoRA adapters with the Together Fine-tuning API.