287 |
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-Precision |
2024-07-11 |
165 |
Based: Simple linear attention language models |
2024-03-05 |
143 |
Dragonfly: A large vision-language model with multi-resolution zoom |
2024-06-06 |
80 |
A practitioner's guide to testing and running GPU clusters |
2024-08-13 |
4 |
The Mamba in the Llama: Distilling and Accelerating Hybrid Models |
2024-09-09 |
3 |
Fine-tuning Llama-3 to get 90% of GPT-4's performance at a fraction of the cost |
2024-07-19 |
3 |
Together Inference Engine 2.0 with new Turbo and Lite endpoints |
2024-07-18 |
2 |
Speculative decoding for high-throughput long-context inference |
2024-09-05 |
2 |
Together MoA–collective intelligence of open-source models pushing LLM frontier |
2024-06-15 |
2 |
Evo: Long-context modeling from molecular to genome scale |
2024-02-27 |
1 |
Flux API available on Together AI:FLUX1.1 [pro] and free access FLUX.1 [schnell] |
2024-10-03 |
1 |
Together AI embeddings endpoint with higher quality, 4x lower cost than OpenAI |
2024-01-11 |
1 |
Linearizing LLMs with LoLCATs |
2024-10-15 |
1 |
Free Llama 3.2 vision API |
2024-09-25 |
1 |
New SOTA Reranker from Salesforce |
2024-09-10 |
1 |
RedPajama-Data-v2: An open dataset with 30T tokens (2023) |
2024-04-22 |
4 |
LlamaTutor |
2024-07-24 |
2 |
Generate react apps with Llama 3.1 |
2024-08-02 |
3 |
Fine-Tuning LLMs for Multi-Turn Conversations: A Technical Deep Dive |
2024-11-27 |
3 |
Together AI acquires CodeSandbox to launch code interpreter for generative AI |
2024-12-12 |