Together AI Blog - Plushcap

71 blog posts published by month since the start of 2024. Start from a different year: 2024
2022
2023
2024
2025

Blog URL

www.together.ai/blog

Posts year-to-date

19 (14 posts by this month last year.)

Average posts per month since 2024

3.0

Post details (2024 to today)

Title	Author	Date	Word count	HN points
Evo: Long-context modeling from molecular to genome scale	Eric Nguyen, Michael Poli, Matthew Durrant, Patrick Hsu, Brian Hie	Feb 27, 2024	1310	2
Introducing the Together Embeddings endpoint — Higher accuracy, longer context, and lower cost	Together AI	Jan 11, 2024	745	1
TEAL: Training-Free Activation Sparsity in Large Language Models	James Liu, Pragaash Ponnusamy, Tianle Cai, Han Guo, Yoon Kim, Ben Athiwaratkun	Aug 28, 2024	1056	-
Introducing Together Rerank API and exclusive access to Salesforce LlamaRank model for enhanced enterprise search	Together AI	Aug 26, 2024	1582	1
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision	Jay Shah (Colfax Research), Ganesh Bikshandi (Colfax Research), Ying Zhang (Meta), Vijay Thakkar (NVIDIA), Pradeep Ramani (NVIDIA), Tri Dao (Princeton University, Together AI)	Jul 11, 2024	1753	287
Building your own RAG application using Together AI and Langchain	Together AI	Jan 11, 2024	610	-
Building a personalized code assistant with open-source LLMs using RAG Fine-tuning	Kezhen Chen, Linda He, Ben Athiwaratkun, Jue Wang, Maurice Weber, Heejin Jeong, Yonatan Oren, Michael Poli	Jun 24, 2024	1333	-
Long context retrieval models with Monarch Mixer	Jon Saad-Falcon, Dan Fu, Simran Arora	Jan 11, 2024	2583	-
Building your own RAG application using Together AI and LlamaIndex	Together AI	Jan 11, 2024	615	-
Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding	Zhuoming Chen, Avner May, Ruslan Svirschevski, Yuhsun Huang, Max Ryabinin, Zhihao Jia, Beidi Chen	Mar 12, 2024	616	-
Together AI partners with Meta to release Meta Llama 3 for inference and fine-tuning	Together AI	Apr 18, 2024	602	-
SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices	Ruslan Svirschevski, Avner May, Zhuoming Chen, Beidi Chen, Zhihao Jia, Max Ryabinin	Jun 18, 2024	1308	-
The Mamba in the Llama: Distilling and Accelerating Hybrid Models	Junxiong Wang, Daniele Paliotta, Avner May, Alexander M. Rush, Tri Dao	Sep 09, 2024	2582	4
Together AI partners with Meta to release Llama 3.1 models for inference and fine-tuning with accelerated performance at full accuracy	Together AI	Jul 23, 2024	933	-
Flash Attention received the inaugural Stanford open source software award	Together AI	May 22, 2024	445	-
Together AI welcomes Kai Mak as CRO to accelerate gen AI adoption for AI natives and enterprises globally	Vipul Ved Prakash	Sep 10, 2024	706	-
Together MoA — collective intelligence of open-source models pushing the frontier of LLM capabilities	Junlin Wang, Jue Wang, Ben Athiwaratkun, Ce Zhang, James Zou	Jun 11, 2024	1422	2
Dragonfly: A large vision-language model with multi-resolution zoom	Kezhen Chen, Rahul Thapa, Rahul Chalamala, Ben Athiwaratkun, Shuaiwen Leon Song, James Zou	Jun 06, 2024	1061	143
Together AI and NVIDIA collaborate to power Llama 3.1 models for enterprises on NVIDIA DGX Cloud	Together AI	Jul 23, 2024	612	-
Announcing v1 of our Python SDK	Together AI	Apr 22, 2024	361	-
Announcing $106M round led by Salesforce Ventures	Vipul Ved Prakash	Mar 13, 2024	999	-
Fine-tuning Llama-3 to get 90% of GPT-4’s performance at a fraction of the cost	Hassan El Mghari	Jul 12, 2024	1292	3
Supercharging NVIDIA H200 and H100 GPU Cluster Performance With Together Kernel Collection	Together AI	Sep 05, 2024	1781	-
ThunderKittens: A Simple Embedded DSL for AI kernels	Benjamin Spector, Aaryan Singhal, Simran Arora, Chris Re	May 12, 2024	659	-
Llama 3.1: Same model, different results. The impact of a percentage point.	Together AI	Jul 31, 2024	5632	-
A practitioner's guide to testing and running large GPU clusters for training generative AI models	Ryan Lucchese, Niki Birkner, Yaron Hagai, Virginia Adams	Aug 13, 2024	2068	80
Speculative decoding for high-throughput long-context inference	Jian Chen, Vashisth Tiwari, Ranajoy Sadhukhan, Yunho Jin, Zhuoming Chen, Jinyuan Shi, Ian En-Hsu Yen, Avner May, Beidi Chen	Sep 05, 2024	2002	2
BitDelta: Your Fine-Tune May Only Be Worth One Bit	James Liu, Guangxuan Xiao, Kai Li, Jason D. Lee, Song Han, Tri Dao, Tianle Cai	Feb 20, 2024	1690	-
BASED: Simple linear attention language models balance the recall-throughput tradeoff	Simran, Sabri, Michael, Aman, Silas, Dylan, James, Atri, Chris	Mar 04, 2024	2303	165
FAQ: Building LLMs with RedPajama-v2, a 30 trillion token web dataset	Together AI	May 01, 2024	2248	-
Together AI partners with Snowflake to bring Arctic LLM to Enterprise customers	Together AI	Apr 25, 2024	422	-
Building your own RAG application using Together AI and MongoDB Atlas	Together AI	Jan 11, 2024	1249	-
Announcing Together Inference Engine 2.0 with new Turbo and Lite endpoints	Together AI	Jul 18, 2024	1802	3
Using Axiomic to build multi agent chat with Together API	Together AI	Jun 05, 2024	1169	-
Announcing function calling and JSON mode	Together AI	Jan 31, 2024	1861	-
Introducing The Together Enterprise Platform: Run GenAI securely in any environment, with 2x faster inference and continuous model optimization	Together AI	Sep 23, 2024	1356	-
Together AI launches Llama 3.2 APIs for vision, lightweight models & Llama Stack: powering rapid development of multimodal agentic apps	Together AI	Sep 25, 2024	1482	1
FLUX API is now available on Together AI: New FLUX1.1 [pro] and free access to FLUX.1 [schnell]	Together AI	Oct 03, 2024	694	1
Multimodal Document RAG with Llama 3.2 Vision and ColQwen2	Zain Hasan	Oct 08, 2024	1613	-
How to build a real-time image generator with Flux and Together AI	Hassan El Mghari	Oct 11, 2024	1197	-
Linearizing LLMs with LoLCATs	Michael Zhang, Simran Arora, Rahul Chalamala, Alan Wu, Benjamin Spector, Aaryan Singhal, Krithik Ramesh, Christopher Ré	Oct 14, 2024	2462	1
Even Better, Even Faster Quantized LLMs with QTIP	Albert Tseng, Qingyao Sun, David Hou, Chris De Sa	Oct 30, 2024	3170	-
Together AI to Co-Build Turbocharged NVIDIA GB200 Cluster with 36K Blackwell GPUs in Partnership with Hypertec Cloud	Together AI	Nov 18, 2024	1230	-
[COMING SOON] FLUX Tools now available via Together APIs: Get greater control over image generation using Canny and Depth	Together AI	Nov 21, 2024	216	-
Fine-Tuning LLMs for Multi-Turn Conversations: A Technical Deep Dive	Artem Chumachenko, Zain Hasan, Max Ryabinin	Nov 25, 2024	2206	3
Long Context Fine-Tuning: A Technical Deep Dive	George Grigorev, Zain Hasan, Max Ryabinin	Nov 25, 2024	1435	-
Fine-tuning API: Introducing long-context training, conversation data support and more configuration options	Max Ryabinin, Artem Chumachenko, George Grigorev, Arsh Zahed, Gleb Vazhenin	Nov 25, 2024	1726	-
AWS Marketplace now offering Together AI to accelerate enterprise AI development	Together AI	Dec 02, 2024	415	-
Announcing Llama 3.3 70B, with enhanced reasoning, mathematics, and instruction-following on Together AI	Together AI	Dec 06, 2024	500	-
Together AI acquires CodeSandbox to launch first-of-its-kind code interpreter for generative AI	Together AI	Dec 12, 2024	932	3
Announcing Serverless Multi-LoRA: Fine-tune and deploy hundreds of adapters for model customization at scale	Together AI	Dec 18, 2024	1224	-
Build ultra low latency voice AI applications with Together AI and Cartesia Sonic	Together AI	Jan 23, 2025	829	-
How to deploy DeepSeek-R1 and distilled models securely on Together AI	Together AI	Jan 31, 2025	1004	-
Mistral Small 3 API now available on Together AI: A new category leader in small models	Together AI	Jan 30, 2025	712	-
Generate images with specific styles using Flux LoRAs on Together AI	Together AI	Jan 27, 2025	891	-
Deploy DeepSeek-R1 at scale: Fast, secure serverless APIs and large-scale Together Reasoning Clusters	Together AI	Feb 12, 2025	984	-
Together AI Achieves 90% Faster BF16 Training with NVIDIA Blackwell Platform and Together Kernel Collection	Together AI	Feb 13, 2025	1422	-
How Zomato built an AI customer support bot that doubled customer satisfaction and scaled to over 1,000 messages per minute	Together AI	Oct 03, 2024	1921	-
Minions: embracing small LMs, shifting compute on-device, and cutting cloud costs in the process	Avanika Narayan, Dan Biderman, Sabri Eyuboglu*, Avner May, Scott Linderman, James Zou, Christopher Ré	Feb 25, 2025	1257	-
Together AI Announces $305M Series B to Scale AI Acceleration Cloud for Open Source and Enterprise AI	Together AI	Feb 20, 2025	808	-
Together AI becomes NVIDIA Cloud Partner to bolster accelerated AI offerings	Together AI	Mar 11, 2025	744	-
ThunderKittens Now Optimized for NVIDIA Blackwell GPUs	Benjamin Spector, Aaryan Singhal, Dan Fu, Chris Ré	Mar 15, 2025	1573	-
On-demand dedicated endpoints: run inference with unmatched price-performance & control at scale	Together AI	Mar 13, 2025	1191	-
Introducing Together Instant GPU Clusters Accelerated by NVIDIA GPUs, with Self-Service Provisioning in Minutes	Together AI	Mar 18, 2025	800	-
Together AI Powers Pioneers at GTC: NVIDIA Blackwell GPUs, Instant GPU Clusters, and A Full-Stack for AI Innovation	Together AI	Mar 18, 2025	1836	-
Deploy Leading AI Models Accelerated by NVIDIA NIM on Together AI	Together AI	Mar 18, 2025	744	-
Introducing Together Chat: use DeepSeek R1 for free, hosted in North America	Hassan El Mghari	Mar 24, 2025	648	-
Together AI Awarded ClusterMAX™ Gold Rating by SemiAnalysis	Together AI	Mar 27, 2025	973	-
Together AI partners with Meta to offer Llama 4: SOTA Multimodal MoE Models	Together AI	Apr 05, 2025	608	-
Scaling AI Companions: How Dippy AI Reached Over 4 Million Tokens/Minute with Together Dedicated Endpoints	Together AI	Apr 01, 2025	1074	-
DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level	Michael Luo, Sijun Tan, Roy Huang, Ameen Patel, Alpay Ariyak, Qingyang Wu, Xiaoxiang Shi, Rachel Xin, Colin Cai, Maurice Weber, Ce Zhang, Li Erran Li, Raluca Ada Popa, Ion Stoica	Apr 08, 2025	2870	-

Together AI blog content

71 blog posts published by month since the start of 2024. Start from a different year: 20242022202320242025

Post details (2024 to today)

71 blog posts published by month since the start of 2024. Start from a different year: 2024
2022
2023
2024
2025