52 blog posts published by month since the start of 2024. Start from a different year:

Posts year-to-date
1 (6 posts by this month last year.)
Average posts per month since 2024
2.2

Post details (2024 to today)

Title Author Date Word count HN points
Evo: Long-context modeling from molecular to genome scale Eric Nguyen, Michael Poli, Matthew Durrant, Patrick Hsu, Brian Hie Feb 27, 2024 1310 2
Introducing the Together Embeddings endpoint — Higher accuracy, longer context, and lower cost Together AI Jan 11, 2024 745 1
TEAL: Training-Free Activation Sparsity in Large Language Models James Liu, Pragaash Ponnusamy, Tianle Cai, Han Guo, Yoon Kim, Ben Athiwaratkun Aug 28, 2024 1056 -
Introducing Together Rerank API and exclusive access to Salesforce LlamaRank model for enhanced enterprise search Together AI Aug 26, 2024 1582 1
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision Jay Shah (Colfax Research), Ganesh Bikshandi (Colfax Research), Ying Zhang (Meta), Vijay Thakkar (NVIDIA), Pradeep Ramani (NVIDIA), Tri Dao (Princeton University, Together AI) Jul 11, 2024 1753 287
Building your own RAG application using Together AI and Langchain Together AI Jan 11, 2024 610 -
Building a personalized code assistant with open-source LLMs using RAG Fine-tuning Kezhen Chen, Linda He, Ben Athiwaratkun, Jue Wang, Maurice Weber, Heejin Jeong, Yonatan Oren, Michael Poli Jun 24, 2024 1333 -
Long context retrieval models with Monarch Mixer Jon Saad-Falcon, Dan Fu, Simran Arora Jan 11, 2024 2583 -
Building your own RAG application using Together AI and LlamaIndex Together AI Jan 11, 2024 615 -
Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding Zhuoming Chen, Avner May, Ruslan Svirschevski, Yuhsun Huang, Max Ryabinin, Zhihao Jia, Beidi Chen Mar 12, 2024 616 -
Together AI partners with Meta to release Meta Llama 3 for inference and fine-tuning Together AI Apr 18, 2024 602 -
SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices Ruslan Svirschevski, Avner May, Zhuoming Chen, Beidi Chen, Zhihao Jia, Max Ryabinin Jun 18, 2024 1308 -
The Mamba in the Llama: Distilling and Accelerating Hybrid Models Junxiong Wang, Daniele Paliotta, Avner May, Alexander M. Rush, Tri Dao Sep 09, 2024 2582 4
Together AI partners with Meta to release Llama 3.1 models for inference and fine-tuning with accelerated performance at full accuracy Together AI Jul 23, 2024 933 -
Flash Attention received the inaugural Stanford open source software award Together AI May 22, 2024 445 -
Together AI welcomes Kai Mak as CRO to accelerate gen AI adoption for AI natives and enterprises globally Vipul Ved Prakash Sep 10, 2024 706 -
Together MoA — collective intelligence of open-source models pushing the frontier of LLM capabilities Junlin Wang, Jue Wang, Ben Athiwaratkun, Ce Zhang, James Zou Jun 11, 2024 1422 2
Dragonfly: A large vision-language model with multi-resolution zoom Kezhen Chen, Rahul Thapa, Rahul Chalamala, Ben Athiwaratkun, Shuaiwen Leon Song, James Zou Jun 06, 2024 1061 143
Together AI and NVIDIA collaborate to power Llama 3.1 models for enterprises on NVIDIA DGX Cloud Together AI Jul 23, 2024 612 -
Announcing v1 of our Python SDK Together AI Apr 22, 2024 361 -
Announcing $106M round led by Salesforce Ventures Vipul Ved Prakash Mar 13, 2024 999 -
Fine-tuning Llama-3 to get 90% of GPT-4’s performance at a fraction of the cost Hassan El Mghari Jul 12, 2024 1292 3
Supercharging NVIDIA H200 and H100 GPU Cluster Performance With Together Kernel Collection Together AI Sep 05, 2024 1781 -
ThunderKittens: A Simple Embedded DSL for AI kernels Benjamin Spector, Aaryan Singhal, Simran Arora, Chris Re May 12, 2024 659 -
Llama 3.1: Same model, different results. The impact of a percentage point. Together AI Jul 31, 2024 5632 -
A practitioner's guide to testing and running large GPU clusters for training generative AI models Ryan Lucchese, Niki Birkner, Yaron Hagai, Virginia Adams Aug 13, 2024 2068 80
Speculative decoding for high-throughput long-context inference Jian Chen, Vashisth Tiwari, Ranajoy Sadhukhan, Yunho Jin, Zhuoming Chen, Jinyuan Shi, Ian En-Hsu Yen, Avner May, Beidi Chen Sep 05, 2024 2002 2
BitDelta: Your Fine-Tune May Only Be Worth One Bit James Liu, Guangxuan Xiao, Kai Li, Jason D. Lee, Song Han, Tri Dao, Tianle Cai Feb 20, 2024 1690 -
BASED: Simple linear attention language models balance the recall-throughput tradeoff Simran, Sabri, Michael, Aman, Silas, Dylan, James, Atri, Chris Mar 04, 2024 2303 165
FAQ: Building LLMs with RedPajama-v2, a 30 trillion token web dataset Together AI May 01, 2024 2248 -
Together AI partners with Snowflake to bring Arctic LLM to Enterprise customers Together AI Apr 25, 2024 422 -
Building your own RAG application using Together AI and MongoDB Atlas Together AI Jan 11, 2024 1249 -
Announcing Together Inference Engine 2.0 with new Turbo and Lite endpoints Together AI Jul 18, 2024 1802 3
Using Axiomic to build multi agent chat with Together API Together AI Jun 05, 2024 1169 -
Announcing function calling and JSON mode Together AI Jan 31, 2024 1861 -
Introducing The Together Enterprise Platform: Run GenAI securely in any environment, with 2x faster inference and continuous model optimization Together AI Sep 23, 2024 1356 -
Together AI launches Llama 3.2 APIs for vision, lightweight models & Llama Stack: powering rapid development of multimodal agentic apps Together AI Sep 25, 2024 1482 1
FLUX API is now available on Together AI: New FLUX1.1 [pro] and free access to FLUX.1 [schnell] Together AI Oct 03, 2024 694 1
Multimodal Document RAG with Llama 3.2 Vision and ColQwen2 Zain Hasan Oct 08, 2024 1613 -
How to build a real-time image generator with Flux and Together AI Hassan El Mghari Oct 11, 2024 1197 -
Linearizing LLMs with LoLCATs Michael Zhang, Simran Arora, Rahul Chalamala, Alan Wu, Benjamin Spector, Aaryan Singhal, Krithik Ramesh, Christopher Ré Oct 14, 2024 2462 1
Even Better, Even Faster Quantized LLMs with QTIP Albert Tseng, Qingyao Sun, David Hou, Chris De Sa Oct 30, 2024 3170 -
Together AI to Co-Build Turbocharged NVIDIA GB200 Cluster with 36K Blackwell GPUs in Partnership with Hypertec Cloud Together AI Nov 18, 2024 1230 -
[COMING SOON] FLUX Tools now available via Together APIs: Get greater control over image generation using Canny and Depth Together AI Nov 21, 2024 216 -
Fine-Tuning LLMs for Multi-Turn Conversations: A Technical Deep Dive Artem Chumachenko, Zain Hasan, Max Ryabinin Nov 25, 2024 2206 3
Long Context Fine-Tuning: A Technical Deep Dive George Grigorev, Zain Hasan, Max Ryabinin Nov 25, 2024 1435 -
Fine-tuning API: Introducing long-context training, conversation data support and more configuration options Max Ryabinin, Artem Chumachenko, George Grigorev, Arsh Zahed, Gleb Vazhenin Nov 25, 2024 1726 -
AWS Marketplace now offering Together AI to accelerate enterprise AI development Together AI Dec 02, 2024 415 -
Announcing Llama 3.3 70B, with enhanced reasoning, mathematics, and instruction-following on Together AI Together AI Dec 06, 2024 500 -
Together AI acquires CodeSandbox to launch first-of-its-kind code interpreter for generative AI Together AI Dec 12, 2024 932 3
Announcing Serverless Multi-LoRA: Fine-tune and deploy hundreds of adapters for model customization at scale Together AI Dec 18, 2024 1224 -
Build ultra low latency voice AI applications with Together AI and Cartesia Sonic Together AI Jan 23, 2025 829 -