Evo: Long-context modeling from molecular to genome scale |
Eric Nguyen, Michael Poli, Matthew Durrant, Patrick Hsu, Brian Hie |
Feb 27, 2024 |
1310 |
2 |
Introducing the Together Embeddings endpoint — Higher accuracy, longer context, and lower cost |
Together AI |
Jan 11, 2024 |
745 |
1 |
TEAL: Training-Free Activation Sparsity in Large Language Models |
James Liu, Pragaash Ponnusamy, Tianle Cai, Han Guo, Yoon Kim, Ben Athiwaratkun |
Aug 28, 2024 |
1056 |
- |
Introducing Together Rerank API and exclusive access to Salesforce LlamaRank model for enhanced enterprise search |
Together AI |
Aug 26, 2024 |
1582 |
1 |
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision |
Jay Shah (Colfax Research), Ganesh Bikshandi (Colfax Research), Ying Zhang (Meta), Vijay Thakkar (NVIDIA), Pradeep Ramani (NVIDIA), Tri Dao (Princeton University, Together AI) |
Jul 11, 2024 |
1753 |
287 |
Building your own RAG application using Together AI and Langchain |
Together AI |
Jan 11, 2024 |
610 |
- |
Building a personalized code assistant with open-source LLMs using RAG Fine-tuning |
Kezhen Chen, Linda He, Ben Athiwaratkun, Jue Wang, Maurice Weber, Heejin Jeong, Yonatan Oren, Michael Poli |
Jun 24, 2024 |
1333 |
- |
Long context retrieval models with Monarch Mixer |
Jon Saad-Falcon, Dan Fu, Simran Arora |
Jan 11, 2024 |
2583 |
- |
Building your own RAG application using Together AI and LlamaIndex |
Together AI |
Jan 11, 2024 |
615 |
- |
Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding |
Zhuoming Chen, Avner May, Ruslan Svirschevski, Yuhsun Huang, Max Ryabinin, Zhihao Jia, Beidi Chen |
Mar 12, 2024 |
616 |
- |
Together AI partners with Meta to release Meta Llama 3 for inference and fine-tuning |
Together AI |
Apr 18, 2024 |
602 |
- |
SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices |
Ruslan Svirschevski, Avner May, Zhuoming Chen, Beidi Chen, Zhihao Jia, Max Ryabinin |
Jun 18, 2024 |
1308 |
- |
The Mamba in the Llama: Distilling and Accelerating Hybrid Models |
Junxiong Wang, Daniele Paliotta, Avner May, Alexander M. Rush, Tri Dao |
Sep 09, 2024 |
2582 |
4 |
Together AI partners with Meta to release Llama 3.1 models for inference and fine-tuning with accelerated performance at full accuracy |
Together AI |
Jul 23, 2024 |
933 |
- |
Flash Attention received the inaugural Stanford open source software award |
Together AI |
May 22, 2024 |
445 |
- |
Together AI welcomes Kai Mak as CRO to accelerate gen AI adoption for AI natives and enterprises globally |
Vipul Ved Prakash |
Sep 10, 2024 |
706 |
- |
Together MoA — collective intelligence of open-source models pushing the frontier of LLM capabilities |
Junlin Wang, Jue Wang, Ben Athiwaratkun, Ce Zhang, James Zou |
Jun 11, 2024 |
1422 |
2 |
Dragonfly: A large vision-language model with multi-resolution zoom |
Kezhen Chen, Rahul Thapa, Rahul Chalamala, Ben Athiwaratkun, Shuaiwen Leon Song, James Zou |
Jun 06, 2024 |
1061 |
143 |
Together AI and NVIDIA collaborate to power Llama 3.1 models for enterprises on NVIDIA DGX Cloud |
Together AI |
Jul 23, 2024 |
612 |
- |
Announcing v1 of our Python SDK |
Together AI |
Apr 22, 2024 |
361 |
- |
Announcing $106M round led by Salesforce Ventures |
Vipul Ved Prakash |
Mar 13, 2024 |
999 |
- |
Fine-tuning Llama-3 to get 90% of GPT-4’s performance at a fraction of the cost |
Hassan El Mghari |
Jul 12, 2024 |
1292 |
3 |
Supercharging NVIDIA H200 and H100 GPU Cluster Performance With Together Kernel Collection |
Together AI |
Sep 05, 2024 |
1781 |
- |
ThunderKittens: A Simple Embedded DSL for AI kernels |
Benjamin Spector, Aaryan Singhal, Simran Arora, Chris Re |
May 12, 2024 |
659 |
- |
Llama 3.1: Same model, different results. The impact of a percentage point. |
Together AI |
Jul 31, 2024 |
5632 |
- |
A practitioner's guide to testing and running large GPU clusters for training generative AI models |
Ryan Lucchese, Niki Birkner, Yaron Hagai, Virginia Adams |
Aug 13, 2024 |
2068 |
80 |
Speculative decoding for high-throughput long-context inference |
Jian Chen, Vashisth Tiwari, Ranajoy Sadhukhan, Yunho Jin, Zhuoming Chen, Jinyuan Shi, Ian En-Hsu Yen, Avner May, Beidi Chen |
Sep 05, 2024 |
2002 |
2 |
BitDelta: Your Fine-Tune May Only Be Worth One Bit |
James Liu, Guangxuan Xiao, Kai Li, Jason D. Lee, Song Han, Tri Dao, Tianle Cai |
Feb 20, 2024 |
1690 |
- |
BASED: Simple linear attention language models balance the recall-throughput tradeoff |
Simran, Sabri, Michael, Aman, Silas, Dylan, James, Atri, Chris |
Mar 04, 2024 |
2303 |
165 |
FAQ: Building LLMs with RedPajama-v2, a 30 trillion token web dataset |
Together AI |
May 01, 2024 |
2248 |
- |
Together AI partners with Snowflake to bring Arctic LLM to Enterprise customers |
Together AI |
Apr 25, 2024 |
422 |
- |
Building your own RAG application using Together AI and MongoDB Atlas |
Together AI |
Jan 11, 2024 |
1249 |
- |
Announcing Together Inference Engine 2.0 with new Turbo and Lite endpoints |
Together AI |
Jul 18, 2024 |
1802 |
3 |
Using Axiomic to build multi agent chat with Together API |
Together AI |
Jun 05, 2024 |
1169 |
- |
Announcing function calling and JSON mode |
Together AI |
Jan 31, 2024 |
1861 |
- |
Introducing The Together Enterprise Platform: Run GenAI securely in any environment, with 2x faster inference and continuous model optimization |
Together AI |
Sep 23, 2024 |
1356 |
- |
Together AI launches Llama 3.2 APIs for vision, lightweight models & Llama Stack: powering rapid development of multimodal agentic apps |
Together AI |
Sep 25, 2024 |
1482 |
1 |
FLUX API is now available on Together AI: New FLUX1.1 [pro] and free access to FLUX.1 [schnell] |
Together AI |
Oct 03, 2024 |
694 |
1 |
Multimodal Document RAG with Llama 3.2 Vision and ColQwen2 |
Zain Hasan |
Oct 08, 2024 |
1613 |
- |
How to build a real-time image generator with Flux and Together AI |
Hassan El Mghari |
Oct 11, 2024 |
1197 |
- |
Linearizing LLMs with LoLCATs |
Michael Zhang, Simran Arora, Rahul Chalamala, Alan Wu, Benjamin Spector, Aaryan Singhal, Krithik Ramesh, Christopher Ré |
Oct 14, 2024 |
2462 |
1 |
Even Better, Even Faster Quantized LLMs with QTIP |
Albert Tseng, Qingyao Sun, David Hou, Chris De Sa |
Oct 30, 2024 |
3170 |
- |
Together AI to Co-Build Turbocharged NVIDIA GB200 Cluster with 36K Blackwell GPUs in Partnership with Hypertec Cloud |
Together AI |
Nov 18, 2024 |
1230 |
- |
[COMING SOON] FLUX Tools now available via Together APIs: Get greater control over image generation using Canny and Depth |
Together AI |
Nov 21, 2024 |
216 |
- |
Fine-Tuning LLMs for Multi-Turn Conversations: A Technical Deep Dive |
Artem Chumachenko, Zain Hasan, Max Ryabinin |
Nov 25, 2024 |
2206 |
3 |
Long Context Fine-Tuning: A Technical Deep Dive |
George Grigorev, Zain Hasan, Max Ryabinin |
Nov 25, 2024 |
1435 |
- |
Fine-tuning API: Introducing long-context training, conversation data support and more configuration options |
Max Ryabinin, Artem Chumachenko, George Grigorev, Arsh Zahed, Gleb Vazhenin |
Nov 25, 2024 |
1726 |
- |
AWS Marketplace now offering Together AI to accelerate enterprise AI development |
Together AI |
Dec 02, 2024 |
415 |
- |
Announcing Llama 3.3 70B, with enhanced reasoning, mathematics, and instruction-following on Together AI |
Together AI |
Dec 06, 2024 |
500 |
- |
Together AI acquires CodeSandbox to launch first-of-its-kind code interpreter for generative AI |
Together AI |
Dec 12, 2024 |
932 |
3 |
Announcing Serverless Multi-LoRA: Fine-tune and deploy hundreds of adapters for model customization at scale |
Together AI |
Dec 18, 2024 |
1224 |
- |
Build ultra low latency voice AI applications with Together AI and Cartesia Sonic |
Together AI |
Jan 23, 2025 |
829 |
- |