92 blog posts published by month since the start of 2022. Start from a different year:

Posts year-to-date
1 (6 posts by this month last year.)
Average posts per month since 2022
1.9

Post details (2022 to today)

Title Author Date Word count HN points
Evo: Long-context modeling from molecular to genome scale Eric Nguyen, Michael Poli, Matthew Durrant, Patrick Hsu, Brian Hie Feb 27, 2024 1310 2
Can you feel the MoE? Mixtral available with over 100 tokens per second through Together Platform! Together Dec 11, 2023 323 -
Introducing the Together Embeddings endpoint — Higher accuracy, longer context, and lower cost Together AI Jan 11, 2024 745 1
Filter responses of any model with Llama Guard or your own safety model Together Dec 10, 2023 356 -
Announcing OpenChatKit Together Mar 10, 2023 2765 -
TEAL: Training-Free Activation Sparsity in Large Language Models James Liu, Pragaash Ponnusamy, Tianle Cai, Han Guo, Yoon Kim, Ben Athiwaratkun Aug 28, 2024 1056 -
How Together and Crusoe are reducing the carbon impact of generative AI Together Apr 20, 2023 737 -
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores Dan Fu, Hermann Kumbong, Eric Nguyen, Chris Ré Nov 13, 2023 1804 -
Introducing Together Rerank API and exclusive access to Salesforce LlamaRank model for enhanced enterprise search Together AI Aug 26, 2024 1582 1
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision Jay Shah (Colfax Research), Ganesh Bikshandi (Colfax Research), Ying Zhang (Meta), Vijay Thakkar (NVIDIA), Pradeep Ramani (NVIDIA), Tri Dao (Princeton University, Together AI) Jul 11, 2024 1753 287
Building your own RAG application using Together AI and Langchain Together AI Jan 11, 2024 610 -
Building a personalized code assistant with open-source LLMs using RAG Fine-tuning Kezhen Chen, Linda He, Ben Athiwaratkun, Jue Wang, Maurice Weber, Heejin Jeong, Yonatan Oren, Michael Poli Jun 24, 2024 1333 -
Preparing for the era of 32K context: Early learnings and explorations Together Jul 28, 2023 1831 -
Long context retrieval models with Monarch Mixer Jon Saad-Falcon, Dan Fu, Simran Arora Jan 11, 2024 2583 -
Building your own RAG application using Together AI and LlamaIndex Together AI Jan 11, 2024 615 -
Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding Zhuoming Chen, Avner May, Ruslan Svirschevski, Yuhsun Huang, Max Ryabinin, Zhihao Jia, Beidi Chen Mar 12, 2024 616 -
NeurIPS 2022: Overcoming communication bottlenecks for decentralized training (1/2) Together Nov 30, 2022 2211 -
Introducing Together AI Chief Scientist Tri Dao, as he releases FlashAttention-2 to speed up model training and inference Together Jul 17, 2023 2001 -
Together AI partners with Meta to release Meta Llama 3 for inference and fine-tuning Together AI Apr 18, 2024 602 -
RedPajama-INCITE-3B, an LLM for everyone Together May 09, 2023 2281 -
Announcing Together Custom Models. Build a state-of-the-art LLM with Together AI — and own the model. Together Nov 13, 2023 1289 -
SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices Ruslan Svirschevski, Avner May, Zhuoming Chen, Beidi Chen, Zhihao Jia, Max Ryabinin Jun 18, 2024 1308 -
The Mamba in the Llama: Distilling and Accelerating Hybrid Models Junxiong Wang, Daniele Paliotta, Avner May, Alexander M. Rush, Tri Dao Sep 09, 2024 2582 4
Together AI launches full stack for developers to build with open-source AI Together Jul 14, 2023 645 -
Llama-2-7B-32K-Instruct — and fine-tuning for Llama-2 models with Together API Together Aug 18, 2023 1092 -
Paving the way to efficient architectures: StripedHyena-7B, open source models offering a glimpse into a world beyond Transformers Together Dec 08, 2023 1712 221
FlashConv: Speeding up state space models Dan Fu and Tri Dao Jan 23, 2023 1100 -
Together AI partners with Meta to release Llama 3.1 models for inference and fine-tuning with accelerated performance at full accuracy Together AI Jul 23, 2024 933 -
Flash Attention received the inaugural Stanford open source software award Together AI May 22, 2024 445 -
NeurIPS 2022: Overcoming communication bottlenecks for decentralized training (2/2) Together Dec 05, 2022 2188 -
Together AI and Snorkel AI empower enterprises to build proprietary LLMs Together Jul 17, 2023 664 -
Announcing Together Inference Engine – the fastest inference available Together AI Nov 13, 2023 880 2
FlashAttention: Fast and memory-efficient exact attention with IO-Awareness Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré May 17, 2023 347 -
RedPajama-Data-v2: An open dataset with 30 trillion tokens for training large language models Together Oct 30, 2023 2223 1
Together AI welcomes Kai Mak as CRO to accelerate gen AI adoption for AI natives and enterprises globally Vipul Ved Prakash Sep 10, 2024 706 -
Together MoA — collective intelligence of open-source models pushing the frontier of LLM capabilities Junlin Wang, Jue Wang, Ben Athiwaratkun, Ce Zhang, James Zou Jun 11, 2024 1422 2
CocktailSGD: Fine-tuning foundation models over 500Mbps networks Jue Wang, Binhang Yuan, Luka Rimanic, Yongjun He, Tri Dao, Beidi Chen, Christopher Re, Ce Zhang Apr 24, 2023 234 -
RedPajama 7B now available, instruct model outperforms all open 7B models on HELM benchmarks Together Jun 06, 2023 1595 -
Dragonfly: A large vision-language model with multi-resolution zoom Kezhen Chen, Rahul Thapa, Rahul Chalamala, Ben Athiwaratkun, Shuaiwen Leon Song, James Zou Jun 06, 2024 1061 143
Mamba-3B-SlimPJ: State-space models rivaling the best Transformer architecture Tri Dao, Albert Gu Dec 12, 2023 550 -
Together AI and NVIDIA collaborate to power Llama 3.1 models for enterprises on NVIDIA DGX Cloud Together AI Jul 23, 2024 612 -
Our $102.5M Series A VIPUL VED PRAKASH Nov 29, 2023 895 70
Announcing v1 of our Python SDK Together AI Apr 22, 2024 361 -
Announcing $106M round led by Salesforce Ventures Vipul Ved Prakash Mar 13, 2024 999 -
Growing to 20 exaflops, Together GPU Clusters help startups and enterprises accelerate generative AI development Together Nov 13, 2023 929 -
Fine-tuning Llama-3 to get 90% of GPT-4’s performance at a fraction of the cost Hassan El Mghari Jul 12, 2024 1292 3
Supercharging NVIDIA H200 and H100 GPU Cluster Performance With Together Kernel Collection Together AI Sep 05, 2024 1781 -
Faster inference enables up to 5x price reduction on Together API Together Aug 11, 2023 379 -
Releasing GPT-JT powered by open-source AI Together Nov 29, 2022 895 -
ThunderKittens: A Simple Embedded DSL for AI kernels Benjamin Spector, Aaryan Singhal, Simran Arora, Chris Re May 12, 2024 659 -
Llama 3.1: Same model, different results. The impact of a percentage point. Together AI Jul 31, 2024 5632 -
A practitioner's guide to testing and running large GPU clusters for training generative AI models Ryan Lucchese, Niki Birkner, Yaron Hagai, Virginia Adams Aug 13, 2024 2068 80
Hungry Hungry Hippos: Towards language modeling with state space models Daniel Y. Fu, Tri Dao, Khaled K. Saab, Armin W. Thomas, Atri Rudra, Christopher Ré Dec 28, 2022 384 -
Releasing 3B and 7B RedPajama-INCITE family of models including base, instruction-tuned & chat models Together May 05, 2023 3989 -
Speculative decoding for high-throughput long-context inference Jian Chen, Vashisth Tiwari, Ranajoy Sadhukhan, Yunho Jin, Zhuoming Chen, Jinyuan Shi, Ian En-Hsu Yen, Avner May, Beidi Chen Sep 05, 2024 2002 2
BitDelta: Your Fine-Tune May Only Be Worth One Bit James Liu, Guangxuan Xiao, Kai Li, Jason D. Lee, Song Han, Tri Dao, Tianle Cai Feb 20, 2024 1690 -
Hyena Hierarchy: Towards larger convolutional language models Michael Poli, Stefano Massaroli, Eric Nguyen, Daniel Y. Fu, Tri Dao, Stephen Baccus, Yoshua Bengio, Stefano Ermon, Christopher Ré Feb 02, 2023 291 -
BASED: Simple linear attention language models balance the recall-throughput tradeoff Simran, Sabri, Michael, Aman, Silas, Dylan, James, Atri, Chris Mar 04, 2024 2303 165
Fine-tuning language models over slow networks using activation compression with guarantees Jue Wang, Binhang Yuan, Luka Rimanic, Yongjun He, Tri Dao, Beidi Chen, Christopher Re, Ce Zhang Jun 02, 2023 336 -
HELM: benchmarking large language models on the Together Research Computer Together Nov 17, 2022 1045 -
RedPajama training progress at 440 billion tokens Together Apr 24, 2023 1090 -
RedPajama, a project to create leading open-source models, starts by reproducing LLaMA training dataset of over 1.2 trillion tokens Together Apr 17, 2023 1032 -
FAQ: Building LLMs with RedPajama-v2, a 30 trillion token web dataset Together AI May 01, 2024 2248 -
FlexGen: High-throughput generative inference of large language models with a single GPU Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Daniel Y. Fu, Zhiqiang Xie, Beidi Chen, Clark Barrett, Joseph E. Gonzalez, Percy Liang, Christopher Ré, Ion Stoica, Ce Zhang Mar 13, 2023 317 -
Flash-Decoding for long-context inference Tri Dao, Daniel Haziza, Francisco Massa, Grigory Sizov Oct 12, 2023 1271 -
Together AI partners with Snowflake to bring Arctic LLM to Enterprise customers Together AI Apr 25, 2024 422 -
Decentralized training of foundation models in heterogeneous environments Binhang Yuan, Yongjun He, Jared Quincy Davis, Tianyi Zhang, Tri Dao, Beidi Chen, Percy Liang, Christopher Re, Ce Zhang Jun 02, 2023 340 -
Building your own RAG application using Together AI and MongoDB Atlas Together AI Jan 11, 2024 1249 -
Announcing Together Inference Engine 2.0 with new Turbo and Lite endpoints Together AI Jul 18, 2024 1802 3
Using Axiomic to build multi agent chat with Together API Together AI Jun 05, 2024 1169 -
Monarch Mixer: A new model architecture for increased efficiency Dan Fu, Simran Arora, Chris Ré Jul 25, 2023 1981 -
Medusa: Simple framework for accelerating LLM generation with multiple decoding heads Tianle Cai*, Yuhong Li*, Zhengyang Geng, Hongwu Peng, Tri Dao (* Equal contribution) Sep 11, 2023 2817 -
OpenChatKit now runs on consumer GPUs with a new 7B parameter model Together Mar 30, 2023 2310 -
Announcing function calling and JSON mode Together AI Jan 31, 2024 1861 -
Together’s $20M seed funding to build open-source AI and cloud platform Vipul Ved Prakash May 15, 2023 602 -
Introducing The Together Enterprise Platform: Run GenAI securely in any environment, with 2x faster inference and continuous model optimization Together AI Sep 23, 2024 1356 -
Together AI launches Llama 3.2 APIs for vision, lightweight models & Llama Stack: powering rapid development of multimodal agentic apps Together AI Sep 25, 2024 1482 1
FLUX API is now available on Together AI: New FLUX1.1 [pro] and free access to FLUX.1 [schnell] Together AI Oct 03, 2024 694 1
Multimodal Document RAG with Llama 3.2 Vision and ColQwen2 Zain Hasan Oct 08, 2024 1613 -
How to build a real-time image generator with Flux and Together AI Hassan El Mghari Oct 11, 2024 1197 -
Linearizing LLMs with LoLCATs Michael Zhang, Simran Arora, Rahul Chalamala, Alan Wu, Benjamin Spector, Aaryan Singhal, Krithik Ramesh, Christopher Ré Oct 14, 2024 2462 1
Even Better, Even Faster Quantized LLMs with QTIP Albert Tseng, Qingyao Sun, David Hou, Chris De Sa Oct 30, 2024 3170 -
Together AI to Co-Build Turbocharged NVIDIA GB200 Cluster with 36K Blackwell GPUs in Partnership with Hypertec Cloud Together AI Nov 18, 2024 1230 -
[COMING SOON] FLUX Tools now available via Together APIs: Get greater control over image generation using Canny and Depth Together AI Nov 21, 2024 216 -
Fine-Tuning LLMs for Multi-Turn Conversations: A Technical Deep Dive Artem Chumachenko, Zain Hasan, Max Ryabinin Nov 25, 2024 2206 3
Long Context Fine-Tuning: A Technical Deep Dive George Grigorev, Zain Hasan, Max Ryabinin Nov 25, 2024 1435 -
Fine-tuning API: Introducing long-context training, conversation data support and more configuration options Max Ryabinin, Artem Chumachenko, George Grigorev, Arsh Zahed, Gleb Vazhenin Nov 25, 2024 1726 -
AWS Marketplace now offering Together AI to accelerate enterprise AI development Together AI Dec 02, 2024 415 -
Announcing Llama 3.3 70B, with enhanced reasoning, mathematics, and instruction-following on Together AI Together AI Dec 06, 2024 500 -
Together AI acquires CodeSandbox to launch first-of-its-kind code interpreter for generative AI Together AI Dec 12, 2024 932 3
Announcing Serverless Multi-LoRA: Fine-tune and deploy hundreds of adapters for model customization at scale Together AI Dec 18, 2024 1224 -
Build ultra low latency voice AI applications with Together AI and Cartesia Sonic Together AI Jan 23, 2025 829 -