Together AI

Founded in 2022. Privately Held.

External links: homepage | docs | blog | jobs | twitter | linkedin

Cloud platform to train, fine-tune, and deploy AI models.

Blog posts published by month since the start of

83 total blog posts published.

Switch to word count

Blog content

post title author published words HN
Evo: Long-context modeling from molecular to genome scale Eric Nguyen, Michael Poli, Matthew Durrant, Patrick Hsu, Brian Hie Feb. 27, 2024 1310 2
Can you feel the MoE? Mixtral available with over 100 tokens per second through Together Platform! Together Dec. 11, 2023 323 -
Introducing the Together Embeddings endpoint — Higher accuracy, longer context, and lower cost Together AI Jan. 11, 2024 745 1
Filter responses of any model with Llama Guard or your own safety model Together Dec. 10, 2023 356 -
Announcing OpenChatKit Together Mar. 10, 2023 2765 -
TEAL: Training-Free Activation Sparsity in Large Language Models James Liu, Pragaash Ponnusamy, Tianle Cai, Han Guo, Yoon Kim, Ben Athiwaratkun Aug. 28, 2024 1056 -
How Together and Crusoe are reducing the carbon impact of generative AI Together Apr. 20, 2023 737 -
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores Dan Fu, Hermann Kumbong, Eric Nguyen, Chris Ré Nov. 13, 2023 1804 -
Introducing Together Rerank API and exclusive access to Salesforce LlamaRank model for enhanced enterprise search Together AI Aug. 26, 2024 1582 1
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision Jay Shah (Colfax Research), Ganesh Bikshandi (Colfax Research), Ying Zhang (Meta), Vijay Thakkar (NVIDIA), Pradeep Ramani (NVIDIA), Tri Dao (Princeton University, Together AI) Jul. 11, 2024 1753 287
Building your own RAG application using Together AI and Langchain Together AI Jan. 11, 2024 610 -
Building a personalized code assistant with open-source LLMs using RAG Fine-tuning Kezhen Chen, Linda He, Ben Athiwaratkun, Jue Wang, Maurice Weber, Heejin Jeong, Yonatan Oren, Michael Poli Jun. 24, 2024 1333 -
Preparing for the era of 32K context: Early learnings and explorations Together Jul. 28, 2023 1831 -
Long context retrieval models with Monarch Mixer Jon Saad-Falcon, Dan Fu, Simran Arora Jan. 11, 2024 2583 -
Building your own RAG application using Together AI and LlamaIndex Together AI Jan. 11, 2024 615 -
Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding Zhuoming Chen, Avner May, Ruslan Svirschevski, Yuhsun Huang, Max Ryabinin, Zhihao Jia, Beidi Chen Mar. 12, 2024 616 -
NeurIPS 2022: Overcoming communication bottlenecks for decentralized training (1/2) Together Nov. 30, 2022 2211 -
Introducing Together AI Chief Scientist Tri Dao, as he releases FlashAttention-2 to speed up model training and inference Together Jul. 17, 2023 2001 -
Together AI partners with Meta to release Meta Llama 3 for inference and fine-tuning Together AI Apr. 18, 2024 602 -
RedPajama-INCITE-3B, an LLM for everyone Together May. 09, 2023 2281 -
Announcing Together Custom Models. Build a state-of-the-art LLM with Together AI — and own the model. Together Nov. 13, 2023 1289 -
SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices Ruslan Svirschevski, Avner May, Zhuoming Chen, Beidi Chen, Zhihao Jia, Max Ryabinin Jun. 18, 2024 1308 -
The Mamba in the Llama: Distilling and Accelerating Hybrid Models Junxiong Wang, Daniele Paliotta, Avner May, Alexander M. Rush, Tri Dao Sep. 09, 2024 2582 4
Together AI launches full stack for developers to build with open-source AI Together Jul. 14, 2023 645 -
Llama-2-7B-32K-Instruct — and fine-tuning for Llama-2 models with Together API Together Aug. 18, 2023 1092 -
Paving the way to efficient architectures: StripedHyena-7B, open source models offering a glimpse into a world beyond Transformers Together Dec. 08, 2023 1712 221
FlashConv: Speeding up state space models Dan Fu and Tri Dao Jan. 23, 2023 1100 -
Together AI partners with Meta to release Llama 3.1 models for inference and fine-tuning with accelerated performance at full accuracy Together AI Jul. 23, 2024 933 -
Flash Attention received the inaugural Stanford open source software award Together AI May. 22, 2024 445 -
NeurIPS 2022: Overcoming communication bottlenecks for decentralized training (2/2) Together Dec. 05, 2022 2188 -
Together AI and Snorkel AI empower enterprises to build proprietary LLMs Together Jul. 17, 2023 664 -
Announcing Together Inference Engine – the fastest inference available Together AI Nov. 13, 2023 880 2
FlashAttention: Fast and memory-efficient exact attention with IO-Awareness Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré May. 17, 2023 347 -
RedPajama-Data-v2: An open dataset with 30 trillion tokens for training large language models Together Oct. 30, 2023 2223 1
Together AI welcomes Kai Mak as CRO to accelerate gen AI adoption for AI natives and enterprises globally Vipul Ved Prakash Sep. 10, 2024 706 -
Together MoA — collective intelligence of open-source models pushing the frontier of LLM capabilities Junlin Wang, Jue Wang, Ben Athiwaratkun, Ce Zhang, James Zou Jun. 11, 2024 1422 2
CocktailSGD: Fine-tuning foundation models over 500Mbps networks Jue Wang, Binhang Yuan, Luka Rimanic, Yongjun He, Tri Dao, Beidi Chen, Christopher Re, Ce Zhang Apr. 24, 2023 234 -
RedPajama 7B now available, instruct model outperforms all open 7B models on HELM benchmarks Together Jun. 06, 2023 1595 -
Dragonfly: A large vision-language model with multi-resolution zoom Kezhen Chen, Rahul Thapa, Rahul Chalamala, Ben Athiwaratkun, Shuaiwen Leon Song, James Zou Jun. 06, 2024 1061 143
Mamba-3B-SlimPJ: State-space models rivaling the best Transformer architecture Tri Dao, Albert Gu Dec. 12, 2023 550 -
Together AI and NVIDIA collaborate to power Llama 3.1 models for enterprises on NVIDIA DGX Cloud Together AI Jul. 23, 2024 612 -
Our $102.5M Series A VIPUL VED PRAKASH Nov. 29, 2023 895 70
Announcing v1 of our Python SDK Together AI Apr. 22, 2024 361 -
Announcing $106M round led by Salesforce Ventures Vipul Ved Prakash Mar. 13, 2024 999 -
Growing to 20 exaflops, Together GPU Clusters help startups and enterprises accelerate generative AI development Together Nov. 13, 2023 929 -
Fine-tuning Llama-3 to get 90% of GPT-4’s performance at a fraction of the cost Hassan El Mghari Jul. 12, 2024 1292 3
Supercharging NVIDIA H200 and H100 GPU Cluster Performance With Together Kernel Collection Together AI Sep. 05, 2024 1781 -
Faster inference enables up to 5x price reduction on Together API Together Aug. 11, 2023 379 -
Releasing GPT-JT powered by open-source AI Together Nov. 29, 2022 895 -
ThunderKittens: A Simple Embedded DSL for AI kernels Benjamin Spector, Aaryan Singhal, Simran Arora, Chris Re May. 12, 2024 659 -
Llama 3.1: Same model, different results. The impact of a percentage point. Together AI Jul. 31, 2024 5632 -
A practitioner's guide to testing and running large GPU clusters for training generative AI models Ryan Lucchese, Niki Birkner, Yaron Hagai, Virginia Adams Aug. 13, 2024 2068 80
Hungry Hungry Hippos: Towards language modeling with state space models Daniel Y. Fu, Tri Dao, Khaled K. Saab, Armin W. Thomas, Atri Rudra, Christopher Ré Dec. 28, 2022 384 -
Releasing 3B and 7B RedPajama-INCITE family of models including base, instruction-tuned & chat models Together May. 05, 2023 3989 -
Speculative decoding for high-throughput long-context inference Jian Chen, Vashisth Tiwari, Ranajoy Sadhukhan, Yunho Jin, Zhuoming Chen, Jinyuan Shi, Ian En-Hsu Yen, Avner May, Beidi Chen Sep. 05, 2024 2002 2
BitDelta: Your Fine-Tune May Only Be Worth One Bit James Liu, Guangxuan Xiao, Kai Li, Jason D. Lee, Song Han, Tri Dao, Tianle Cai Feb. 20, 2024 1690 -
Hyena Hierarchy: Towards larger convolutional language models Michael Poli, Stefano Massaroli, Eric Nguyen, Daniel Y. Fu, Tri Dao, Stephen Baccus, Yoshua Bengio, Stefano Ermon, Christopher Ré Feb. 02, 2023 291 -
BASED: Simple linear attention language models balance the recall-throughput tradeoff Simran, Sabri, Michael, Aman, Silas, Dylan, James, Atri, Chris Mar. 04, 2024 2303 165
Fine-tuning language models over slow networks using activation compression with guarantees Jue Wang, Binhang Yuan, Luka Rimanic, Yongjun He, Tri Dao, Beidi Chen, Christopher Re, Ce Zhang Jun. 02, 2023 336 -
HELM: benchmarking large language models on the Together Research Computer Together Nov. 17, 2022 1045 -
RedPajama training progress at 440 billion tokens Together Apr. 24, 2023 1090 -
RedPajama, a project to create leading open-source models, starts by reproducing LLaMA training dataset of over 1.2 trillion tokens Together Apr. 17, 2023 1032 -
FAQ: Building LLMs with RedPajama-v2, a 30 trillion token web dataset Together AI May. 01, 2024 2248 -
FlexGen: High-throughput generative inference of large language models with a single GPU Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Daniel Y. Fu, Zhiqiang Xie, Beidi Chen, Clark Barrett, Joseph E. Gonzalez, Percy Liang, Christopher Ré, Ion Stoica, Ce Zhang Mar. 13, 2023 317 -
Flash-Decoding for long-context inference Tri Dao, Daniel Haziza, Francisco Massa, Grigory Sizov Oct. 12, 2023 1271 -
Together AI partners with Snowflake to bring Arctic LLM to Enterprise customers Together AI Apr. 25, 2024 422 -
Decentralized training of foundation models in heterogeneous environments Binhang Yuan, Yongjun He, Jared Quincy Davis, Tianyi Zhang, Tri Dao, Beidi Chen, Percy Liang, Christopher Re, Ce Zhang Jun. 02, 2023 340 -
Building your own RAG application using Together AI and MongoDB Atlas Together AI Jan. 11, 2024 1249 -
Announcing Together Inference Engine 2.0 with new Turbo and Lite endpoints Together AI Jul. 18, 2024 1802 3
Using Axiomic to build multi agent chat with Together API Together AI Jun. 05, 2024 1169 -
Monarch Mixer: A new model architecture for increased efficiency Dan Fu, Simran Arora, Chris Ré Jul. 25, 2023 1981 -
Medusa: Simple framework for accelerating LLM generation with multiple decoding heads Tianle Cai*, Yuhong Li*, Zhengyang Geng, Hongwu Peng, Tri Dao (* Equal contribution) Sep. 11, 2023 2817 -
OpenChatKit now runs on consumer GPUs with a new 7B parameter model Together Mar. 30, 2023 2310 -
Announcing function calling and JSON mode Together AI Jan. 31, 2024 1861 -
Together’s $20M seed funding to build open-source AI and cloud platform Vipul Ved Prakash May. 15, 2023 602 -
Introducing The Together Enterprise Platform: Run GenAI securely in any environment, with 2x faster inference and continuous model optimization Together AI Sep. 23, 2024 1356 -
Together AI launches Llama 3.2 APIs for vision, lightweight models & Llama Stack: powering rapid development of multimodal agentic apps Together AI Sep. 25, 2024 1482 1
FLUX API is now available on Together AI: New FLUX1.1 [pro] and free access to FLUX.1 [schnell] Together AI Oct. 03, 2024 694 1
Multimodal Document RAG with Llama 3.2 Vision and ColQwen2 Zain Hasan Oct. 08, 2024 1613 -
How to build a real-time image generator with Flux and Together AI Hassan El Mghari Oct. 11, 2024 1197 -
Linearizing LLMs with LoLCATs Michael Zhang, Simran Arora, Rahul Chalamala, Alan Wu, Benjamin Spector, Aaryan Singhal, Krithik Ramesh, Christopher Ré Oct. 14, 2024 2462 1
Even Better, Even Faster Quantized LLMs with QTIP Albert Tseng, Qingyao Sun, David Hou, Chris De Sa Oct. 30, 2024 3170 -
Together AI to Co-Build Turbocharged NVIDIA GB200 Cluster with 36K Blackwell GPUs in Partnership with Hypertec Cloud Together AI Nov. 18, 2024 1230 -

By Matt Makai. 2021-2024.