Together AI Blog - Plushcap

111 blog posts published by month since the start of 2022. Start from a different year: 2022
2022
2023
2024
2025

Blog URL

www.together.ai/blog

Posts year-to-date

19 (14 posts by this month last year.)

Average posts per month since 2022

2.3

Post details (2022 to today)

Title	Author	Date	Word count	HN points
Evo: Long-context modeling from molecular to genome scale	Eric Nguyen, Michael Poli, Matthew Durrant, Patrick Hsu, Brian Hie	Feb 27, 2024	1310	2
Can you feel the MoE? Mixtral available with over 100 tokens per second through Together Platform!	Together	Dec 11, 2023	323	-
Introducing the Together Embeddings endpoint — Higher accuracy, longer context, and lower cost	Together AI	Jan 11, 2024	745	1
Filter responses of any model with Llama Guard or your own safety model	Together	Dec 10, 2023	356	-
Announcing OpenChatKit	Together	Mar 10, 2023	2765	-
TEAL: Training-Free Activation Sparsity in Large Language Models	James Liu, Pragaash Ponnusamy, Tianle Cai, Han Guo, Yoon Kim, Ben Athiwaratkun	Aug 28, 2024	1056	-
How Together and Crusoe are reducing the carbon impact of generative AI	Together	Apr 20, 2023	737	-
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores	Dan Fu, Hermann Kumbong, Eric Nguyen, Chris Ré	Nov 13, 2023	1804	-
Introducing Together Rerank API and exclusive access to Salesforce LlamaRank model for enhanced enterprise search	Together AI	Aug 26, 2024	1582	1
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision	Jay Shah (Colfax Research), Ganesh Bikshandi (Colfax Research), Ying Zhang (Meta), Vijay Thakkar (NVIDIA), Pradeep Ramani (NVIDIA), Tri Dao (Princeton University, Together AI)	Jul 11, 2024	1753	287
Building your own RAG application using Together AI and Langchain	Together AI	Jan 11, 2024	610	-
Building a personalized code assistant with open-source LLMs using RAG Fine-tuning	Kezhen Chen, Linda He, Ben Athiwaratkun, Jue Wang, Maurice Weber, Heejin Jeong, Yonatan Oren, Michael Poli	Jun 24, 2024	1333	-
Preparing for the era of 32K context: Early learnings and explorations	Together	Jul 28, 2023	1831	-
Long context retrieval models with Monarch Mixer	Jon Saad-Falcon, Dan Fu, Simran Arora	Jan 11, 2024	2583	-
Building your own RAG application using Together AI and LlamaIndex	Together AI	Jan 11, 2024	615	-
Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding	Zhuoming Chen, Avner May, Ruslan Svirschevski, Yuhsun Huang, Max Ryabinin, Zhihao Jia, Beidi Chen	Mar 12, 2024	616	-
NeurIPS 2022: Overcoming communication bottlenecks for decentralized training (1/2)	Together	Nov 30, 2022	2211	-
Introducing Together AI Chief Scientist Tri Dao, as he releases FlashAttention-2 to speed up model training and inference	Together	Jul 17, 2023	2001	-
Together AI partners with Meta to release Meta Llama 3 for inference and fine-tuning	Together AI	Apr 18, 2024	602	-
RedPajama-INCITE-3B, an LLM for everyone	Together	May 09, 2023	2281	-
Announcing Together Custom Models. Build a state-of-the-art LLM with Together AI — and own the model.	Together	Nov 13, 2023	1289	-
SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices	Ruslan Svirschevski, Avner May, Zhuoming Chen, Beidi Chen, Zhihao Jia, Max Ryabinin	Jun 18, 2024	1308	-
The Mamba in the Llama: Distilling and Accelerating Hybrid Models	Junxiong Wang, Daniele Paliotta, Avner May, Alexander M. Rush, Tri Dao	Sep 09, 2024	2582	4
Together AI launches full stack for developers to build with open-source AI	Together	Jul 14, 2023	645	-
Llama-2-7B-32K-Instruct — and fine-tuning for Llama-2 models with Together API	Together	Aug 18, 2023	1092	-
Paving the way to efficient architectures: StripedHyena-7B, open source models offering a glimpse into a world beyond Transformers	Together	Dec 08, 2023	1712	221
FlashConv: Speeding up state space models	Dan Fu and Tri Dao	Jan 23, 2023	1100	-
Together AI partners with Meta to release Llama 3.1 models for inference and fine-tuning with accelerated performance at full accuracy	Together AI	Jul 23, 2024	933	-
Flash Attention received the inaugural Stanford open source software award	Together AI	May 22, 2024	445	-
NeurIPS 2022: Overcoming communication bottlenecks for decentralized training (2/2)	Together	Dec 05, 2022	2188	-
Together AI and Snorkel AI empower enterprises to build proprietary LLMs	Together	Jul 17, 2023	664	-
Announcing Together Inference Engine – the fastest inference available	Together AI	Nov 13, 2023	880	2
FlashAttention: Fast and memory-efficient exact attention with IO-Awareness	Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré	May 17, 2023	347	-
RedPajama-Data-v2: An open dataset with 30 trillion tokens for training large language models	Together	Oct 30, 2023	2223	1
Together AI welcomes Kai Mak as CRO to accelerate gen AI adoption for AI natives and enterprises globally	Vipul Ved Prakash	Sep 10, 2024	706	-
Together MoA — collective intelligence of open-source models pushing the frontier of LLM capabilities	Junlin Wang, Jue Wang, Ben Athiwaratkun, Ce Zhang, James Zou	Jun 11, 2024	1422	2
CocktailSGD: Fine-tuning foundation models over 500Mbps networks	Jue Wang, Binhang Yuan, Luka Rimanic, Yongjun He, Tri Dao, Beidi Chen, Christopher Re, Ce Zhang	Apr 24, 2023	234	-
RedPajama 7B now available, instruct model outperforms all open 7B models on HELM benchmarks	Together	Jun 06, 2023	1595	-
Dragonfly: A large vision-language model with multi-resolution zoom	Kezhen Chen, Rahul Thapa, Rahul Chalamala, Ben Athiwaratkun, Shuaiwen Leon Song, James Zou	Jun 06, 2024	1061	143
Mamba-3B-SlimPJ: State-space models rivaling the best Transformer architecture	Tri Dao, Albert Gu	Dec 12, 2023	550	-
Together AI and NVIDIA collaborate to power Llama 3.1 models for enterprises on NVIDIA DGX Cloud	Together AI	Jul 23, 2024	612	-
Our $102.5M Series A	VIPUL VED PRAKASH	Nov 29, 2023	895	70
Announcing v1 of our Python SDK	Together AI	Apr 22, 2024	361	-
Announcing $106M round led by Salesforce Ventures	Vipul Ved Prakash	Mar 13, 2024	999	-
Growing to 20 exaflops, Together GPU Clusters help startups and enterprises accelerate generative AI development	Together	Nov 13, 2023	929	-
Fine-tuning Llama-3 to get 90% of GPT-4’s performance at a fraction of the cost	Hassan El Mghari	Jul 12, 2024	1292	3
Supercharging NVIDIA H200 and H100 GPU Cluster Performance With Together Kernel Collection	Together AI	Sep 05, 2024	1781	-
Faster inference enables up to 5x price reduction on Together API	Together	Aug 11, 2023	379	-
Releasing GPT-JT powered by open-source AI	Together	Nov 29, 2022	895	-
ThunderKittens: A Simple Embedded DSL for AI kernels	Benjamin Spector, Aaryan Singhal, Simran Arora, Chris Re	May 12, 2024	659	-
Llama 3.1: Same model, different results. The impact of a percentage point.	Together AI	Jul 31, 2024	5632	-
A practitioner's guide to testing and running large GPU clusters for training generative AI models	Ryan Lucchese, Niki Birkner, Yaron Hagai, Virginia Adams	Aug 13, 2024	2068	80
Hungry Hungry Hippos: Towards language modeling with state space models	Daniel Y. Fu, Tri Dao, Khaled K. Saab, Armin W. Thomas, Atri Rudra, Christopher Ré	Dec 28, 2022	384	-
Releasing 3B and 7B RedPajama-INCITE family of models including base, instruction-tuned & chat models	Together	May 05, 2023	3989	-
Speculative decoding for high-throughput long-context inference	Jian Chen, Vashisth Tiwari, Ranajoy Sadhukhan, Yunho Jin, Zhuoming Chen, Jinyuan Shi, Ian En-Hsu Yen, Avner May, Beidi Chen	Sep 05, 2024	2002	2
BitDelta: Your Fine-Tune May Only Be Worth One Bit	James Liu, Guangxuan Xiao, Kai Li, Jason D. Lee, Song Han, Tri Dao, Tianle Cai	Feb 20, 2024	1690	-
Hyena Hierarchy: Towards larger convolutional language models	Michael Poli, Stefano Massaroli, Eric Nguyen, Daniel Y. Fu, Tri Dao, Stephen Baccus, Yoshua Bengio, Stefano Ermon, Christopher Ré	Feb 02, 2023	291	-
BASED: Simple linear attention language models balance the recall-throughput tradeoff	Simran, Sabri, Michael, Aman, Silas, Dylan, James, Atri, Chris	Mar 04, 2024	2303	165
Fine-tuning language models over slow networks using activation compression with guarantees	Jue Wang, Binhang Yuan, Luka Rimanic, Yongjun He, Tri Dao, Beidi Chen, Christopher Re, Ce Zhang	Jun 02, 2023	336	-
HELM: benchmarking large language models on the Together Research Computer	Together	Nov 17, 2022	1045	-
RedPajama training progress at 440 billion tokens	Together	Apr 24, 2023	1090	-
RedPajama, a project to create leading open-source models, starts by reproducing LLaMA training dataset of over 1.2 trillion tokens	Together	Apr 17, 2023	1032	-
FAQ: Building LLMs with RedPajama-v2, a 30 trillion token web dataset	Together AI	May 01, 2024	2248	-
FlexGen: High-throughput generative inference of large language models with a single GPU	Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Daniel Y. Fu, Zhiqiang Xie, Beidi Chen, Clark Barrett, Joseph E. Gonzalez, Percy Liang, Christopher Ré, Ion Stoica, Ce Zhang	Mar 13, 2023	317	-
Flash-Decoding for long-context inference	Tri Dao, Daniel Haziza, Francisco Massa, Grigory Sizov	Oct 12, 2023	1271	-
Together AI partners with Snowflake to bring Arctic LLM to Enterprise customers	Together AI	Apr 25, 2024	422	-
Decentralized training of foundation models in heterogeneous environments	Binhang Yuan, Yongjun He, Jared Quincy Davis, Tianyi Zhang, Tri Dao, Beidi Chen, Percy Liang, Christopher Re, Ce Zhang	Jun 02, 2023	340	-
Building your own RAG application using Together AI and MongoDB Atlas	Together AI	Jan 11, 2024	1249	-
Announcing Together Inference Engine 2.0 with new Turbo and Lite endpoints	Together AI	Jul 18, 2024	1802	3
Using Axiomic to build multi agent chat with Together API	Together AI	Jun 05, 2024	1169	-
Monarch Mixer: A new model architecture for increased efficiency	Dan Fu, Simran Arora, Chris Ré	Jul 25, 2023	1981	-
Medusa: Simple framework for accelerating LLM generation with multiple decoding heads	Tianle Cai, Yuhong Li, Zhengyang Geng, Hongwu Peng, Tri Dao (* Equal contribution)	Sep 11, 2023	2817	-
OpenChatKit now runs on consumer GPUs with a new 7B parameter model	Together	Mar 30, 2023	2310	-
Announcing function calling and JSON mode	Together AI	Jan 31, 2024	1861	-
Together’s $20M seed funding to build open-source AI and cloud platform	Vipul Ved Prakash	May 15, 2023	602	-
Introducing The Together Enterprise Platform: Run GenAI securely in any environment, with 2x faster inference and continuous model optimization	Together AI	Sep 23, 2024	1356	-
Together AI launches Llama 3.2 APIs for vision, lightweight models & Llama Stack: powering rapid development of multimodal agentic apps	Together AI	Sep 25, 2024	1482	1
FLUX API is now available on Together AI: New FLUX1.1 [pro] and free access to FLUX.1 [schnell]	Together AI	Oct 03, 2024	694	1
Multimodal Document RAG with Llama 3.2 Vision and ColQwen2	Zain Hasan	Oct 08, 2024	1613	-
How to build a real-time image generator with Flux and Together AI	Hassan El Mghari	Oct 11, 2024	1197	-
Linearizing LLMs with LoLCATs	Michael Zhang, Simran Arora, Rahul Chalamala, Alan Wu, Benjamin Spector, Aaryan Singhal, Krithik Ramesh, Christopher Ré	Oct 14, 2024	2462	1
Even Better, Even Faster Quantized LLMs with QTIP	Albert Tseng, Qingyao Sun, David Hou, Chris De Sa	Oct 30, 2024	3170	-
Together AI to Co-Build Turbocharged NVIDIA GB200 Cluster with 36K Blackwell GPUs in Partnership with Hypertec Cloud	Together AI	Nov 18, 2024	1230	-
[COMING SOON] FLUX Tools now available via Together APIs: Get greater control over image generation using Canny and Depth	Together AI	Nov 21, 2024	216	-
Fine-Tuning LLMs for Multi-Turn Conversations: A Technical Deep Dive	Artem Chumachenko, Zain Hasan, Max Ryabinin	Nov 25, 2024	2206	3
Long Context Fine-Tuning: A Technical Deep Dive	George Grigorev, Zain Hasan, Max Ryabinin	Nov 25, 2024	1435	-
Fine-tuning API: Introducing long-context training, conversation data support and more configuration options	Max Ryabinin, Artem Chumachenko, George Grigorev, Arsh Zahed, Gleb Vazhenin	Nov 25, 2024	1726	-
AWS Marketplace now offering Together AI to accelerate enterprise AI development	Together AI	Dec 02, 2024	415	-
Announcing Llama 3.3 70B, with enhanced reasoning, mathematics, and instruction-following on Together AI	Together AI	Dec 06, 2024	500	-
Together AI acquires CodeSandbox to launch first-of-its-kind code interpreter for generative AI	Together AI	Dec 12, 2024	932	3
Announcing Serverless Multi-LoRA: Fine-tune and deploy hundreds of adapters for model customization at scale	Together AI	Dec 18, 2024	1224	-
Build ultra low latency voice AI applications with Together AI and Cartesia Sonic	Together AI	Jan 23, 2025	829	-
How to deploy DeepSeek-R1 and distilled models securely on Together AI	Together AI	Jan 31, 2025	1004	-
Mistral Small 3 API now available on Together AI: A new category leader in small models	Together AI	Jan 30, 2025	712	-
Generate images with specific styles using Flux LoRAs on Together AI	Together AI	Jan 27, 2025	891	-
Deploy DeepSeek-R1 at scale: Fast, secure serverless APIs and large-scale Together Reasoning Clusters	Together AI	Feb 12, 2025	984	-
Together AI Achieves 90% Faster BF16 Training with NVIDIA Blackwell Platform and Together Kernel Collection	Together AI	Feb 13, 2025	1422	-
How Zomato built an AI customer support bot that doubled customer satisfaction and scaled to over 1,000 messages per minute	Together AI	Oct 03, 2024	1921	-
Minions: embracing small LMs, shifting compute on-device, and cutting cloud costs in the process	Avanika Narayan, Dan Biderman, Sabri Eyuboglu*, Avner May, Scott Linderman, James Zou, Christopher Ré	Feb 25, 2025	1257	-
Together AI Announces $305M Series B to Scale AI Acceleration Cloud for Open Source and Enterprise AI	Together AI	Feb 20, 2025	808	-
Together AI becomes NVIDIA Cloud Partner to bolster accelerated AI offerings	Together AI	Mar 11, 2025	744	-
ThunderKittens Now Optimized for NVIDIA Blackwell GPUs	Benjamin Spector, Aaryan Singhal, Dan Fu, Chris Ré	Mar 15, 2025	1573	-
On-demand dedicated endpoints: run inference with unmatched price-performance & control at scale	Together AI	Mar 13, 2025	1191	-
Introducing Together Instant GPU Clusters Accelerated by NVIDIA GPUs, with Self-Service Provisioning in Minutes	Together AI	Mar 18, 2025	800	-
Together AI Powers Pioneers at GTC: NVIDIA Blackwell GPUs, Instant GPU Clusters, and A Full-Stack for AI Innovation	Together AI	Mar 18, 2025	1836	-
Deploy Leading AI Models Accelerated by NVIDIA NIM on Together AI	Together AI	Mar 18, 2025	744	-
Introducing Together Chat: use DeepSeek R1 for free, hosted in North America	Hassan El Mghari	Mar 24, 2025	648	-
Together AI Awarded ClusterMAX™ Gold Rating by SemiAnalysis	Together AI	Mar 27, 2025	973	-
Together AI partners with Meta to offer Llama 4: SOTA Multimodal MoE Models	Together AI	Apr 05, 2025	608	-
Scaling AI Companions: How Dippy AI Reached Over 4 Million Tokens/Minute with Together Dedicated Endpoints	Together AI	Apr 01, 2025	1074	-
DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level	Michael Luo, Sijun Tan, Roy Huang, Ameen Patel, Alpay Ariyak, Qingyang Wu, Xiaoxiang Shi, Rachel Xin, Colin Cai, Maurice Weber, Ce Zhang, Li Erran Li, Raluca Ada Popa, Ion Stoica	Apr 08, 2025	2870	-

Together AI blog content

111 blog posts published by month since the start of 2022. Start from a different year: 20222022202320242025

Post details (2022 to today)

111 blog posts published by month since the start of 2022. Start from a different year: 2022
2022
2023
2024
2025