Evo: Long-context modeling from molecular to genome scale |
Eric Nguyen, Michael Poli, Matthew Durrant, Patrick Hsu, Brian Hie |
Feb. 27, 2024 |
1310 |
2 |
Can you feel the MoE? Mixtral available with over 100 tokens per second through Together Platform! |
Together |
Dec. 11, 2023 |
323 |
- |
Introducing the Together Embeddings endpoint — Higher accuracy, longer context, and lower cost |
Together AI |
Jan. 11, 2024 |
745 |
1 |
Filter responses of any model with Llama Guard or your own safety model |
Together |
Dec. 10, 2023 |
356 |
- |
Announcing OpenChatKit |
Together |
Mar. 10, 2023 |
2765 |
- |
TEAL: Training-Free Activation Sparsity in Large Language Models |
James Liu, Pragaash Ponnusamy, Tianle Cai, Han Guo, Yoon Kim, Ben Athiwaratkun |
Aug. 28, 2024 |
1056 |
- |
How Together and Crusoe are reducing the carbon impact of generative AI |
Together |
Apr. 20, 2023 |
737 |
- |
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores |
Dan Fu, Hermann Kumbong, Eric Nguyen, Chris Ré |
Nov. 13, 2023 |
1804 |
- |
Introducing Together Rerank API and exclusive access to Salesforce LlamaRank model for enhanced enterprise search |
Together AI |
Aug. 26, 2024 |
1582 |
1 |
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision |
Jay Shah (Colfax Research), Ganesh Bikshandi (Colfax Research), Ying Zhang (Meta), Vijay Thakkar (NVIDIA), Pradeep Ramani (NVIDIA), Tri Dao (Princeton University, Together AI) |
Jul. 11, 2024 |
1753 |
287 |
Building your own RAG application using Together AI and Langchain |
Together AI |
Jan. 11, 2024 |
610 |
- |
Building a personalized code assistant with open-source LLMs using RAG Fine-tuning |
Kezhen Chen, Linda He, Ben Athiwaratkun, Jue Wang, Maurice Weber, Heejin Jeong, Yonatan Oren, Michael Poli |
Jun. 24, 2024 |
1333 |
- |
Preparing for the era of 32K context: Early learnings and explorations |
Together |
Jul. 28, 2023 |
1831 |
- |
Long context retrieval models with Monarch Mixer |
Jon Saad-Falcon, Dan Fu, Simran Arora |
Jan. 11, 2024 |
2583 |
- |
Building your own RAG application using Together AI and LlamaIndex |
Together AI |
Jan. 11, 2024 |
615 |
- |
Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding |
Zhuoming Chen, Avner May, Ruslan Svirschevski, Yuhsun Huang, Max Ryabinin, Zhihao Jia, Beidi Chen |
Mar. 12, 2024 |
616 |
- |
NeurIPS 2022: Overcoming communication bottlenecks for decentralized training (1/2) |
Together |
Nov. 30, 2022 |
2211 |
- |
Introducing Together AI Chief Scientist Tri Dao, as he releases FlashAttention-2 to speed up model training and inference |
Together |
Jul. 17, 2023 |
2001 |
- |
Together AI partners with Meta to release Meta Llama 3 for inference and fine-tuning |
Together AI |
Apr. 18, 2024 |
602 |
- |
RedPajama-INCITE-3B, an LLM for everyone |
Together |
May. 09, 2023 |
2281 |
- |
Announcing Together Custom Models. Build a state-of-the-art LLM with Together AI — and own the model. |
Together |
Nov. 13, 2023 |
1289 |
- |
SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices |
Ruslan Svirschevski, Avner May, Zhuoming Chen, Beidi Chen, Zhihao Jia, Max Ryabinin |
Jun. 18, 2024 |
1308 |
- |
The Mamba in the Llama: Distilling and Accelerating Hybrid Models |
Junxiong Wang, Daniele Paliotta, Avner May, Alexander M. Rush, Tri Dao |
Sep. 09, 2024 |
2582 |
4 |
Together AI launches full stack for developers to build with open-source AI |
Together |
Jul. 14, 2023 |
645 |
- |
Llama-2-7B-32K-Instruct — and fine-tuning for Llama-2 models with Together API |
Together |
Aug. 18, 2023 |
1092 |
- |
Paving the way to efficient architectures: StripedHyena-7B, open source models offering a glimpse into a world beyond Transformers |
Together |
Dec. 08, 2023 |
1712 |
221 |
FlashConv: Speeding up state space models |
Dan Fu and Tri Dao |
Jan. 23, 2023 |
1100 |
- |
Together AI partners with Meta to release Llama 3.1 models for inference and fine-tuning with accelerated performance at full accuracy |
Together AI |
Jul. 23, 2024 |
933 |
- |
Flash Attention received the inaugural Stanford open source software award |
Together AI |
May. 22, 2024 |
445 |
- |
NeurIPS 2022: Overcoming communication bottlenecks for decentralized training (2/2) |
Together |
Dec. 05, 2022 |
2188 |
- |
Together AI and Snorkel AI empower enterprises to build proprietary LLMs |
Together |
Jul. 17, 2023 |
664 |
- |
Announcing Together Inference Engine – the fastest inference available |
Together AI |
Nov. 13, 2023 |
880 |
2 |
FlashAttention: Fast and memory-efficient exact attention with IO-Awareness |
Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré |
May. 17, 2023 |
347 |
- |
RedPajama-Data-v2: An open dataset with 30 trillion tokens for training large language models |
Together |
Oct. 30, 2023 |
2223 |
1 |
Together AI welcomes Kai Mak as CRO to accelerate gen AI adoption for AI natives and enterprises globally |
Vipul Ved Prakash |
Sep. 10, 2024 |
706 |
- |
Together MoA — collective intelligence of open-source models pushing the frontier of LLM capabilities |
Junlin Wang, Jue Wang, Ben Athiwaratkun, Ce Zhang, James Zou |
Jun. 11, 2024 |
1422 |
2 |
CocktailSGD: Fine-tuning foundation models over 500Mbps networks |
Jue Wang, Binhang Yuan, Luka Rimanic, Yongjun He, Tri Dao, Beidi Chen, Christopher Re, Ce Zhang |
Apr. 24, 2023 |
234 |
- |
RedPajama 7B now available, instruct model outperforms all open 7B models on HELM benchmarks |
Together |
Jun. 06, 2023 |
1595 |
- |
Dragonfly: A large vision-language model with multi-resolution zoom |
Kezhen Chen, Rahul Thapa, Rahul Chalamala, Ben Athiwaratkun, Shuaiwen Leon Song, James Zou |
Jun. 06, 2024 |
1061 |
143 |
Mamba-3B-SlimPJ: State-space models rivaling the best Transformer architecture |
Tri Dao, Albert Gu |
Dec. 12, 2023 |
550 |
- |
Together AI and NVIDIA collaborate to power Llama 3.1 models for enterprises on NVIDIA DGX Cloud |
Together AI |
Jul. 23, 2024 |
612 |
- |
Our $102.5M Series A |
VIPUL VED PRAKASH |
Nov. 29, 2023 |
895 |
70 |
Announcing v1 of our Python SDK |
Together AI |
Apr. 22, 2024 |
361 |
- |
Announcing $106M round led by Salesforce Ventures |
Vipul Ved Prakash |
Mar. 13, 2024 |
999 |
- |
Growing to 20 exaflops, Together GPU Clusters help startups and enterprises accelerate generative AI development |
Together |
Nov. 13, 2023 |
929 |
- |
Fine-tuning Llama-3 to get 90% of GPT-4’s performance at a fraction of the cost |
Hassan El Mghari |
Jul. 12, 2024 |
1292 |
3 |
Supercharging NVIDIA H200 and H100 GPU Cluster Performance With Together Kernel Collection |
Together AI |
Sep. 05, 2024 |
1781 |
- |
Faster inference enables up to 5x price reduction on Together API |
Together |
Aug. 11, 2023 |
379 |
- |
Releasing GPT-JT powered by open-source AI |
Together |
Nov. 29, 2022 |
895 |
- |
ThunderKittens: A Simple Embedded DSL for AI kernels |
Benjamin Spector, Aaryan Singhal, Simran Arora, Chris Re |
May. 12, 2024 |
659 |
- |
Llama 3.1: Same model, different results. The impact of a percentage point. |
Together AI |
Jul. 31, 2024 |
5632 |
- |
A practitioner's guide to testing and running large GPU clusters for training generative AI models |
Ryan Lucchese, Niki Birkner, Yaron Hagai, Virginia Adams |
Aug. 13, 2024 |
2068 |
80 |
Hungry Hungry Hippos: Towards language modeling with state space models |
Daniel Y. Fu, Tri Dao, Khaled K. Saab, Armin W. Thomas, Atri Rudra, Christopher Ré |
Dec. 28, 2022 |
384 |
- |
Releasing 3B and 7B RedPajama-INCITE family of models including base, instruction-tuned & chat models |
Together |
May. 05, 2023 |
3989 |
- |
Speculative decoding for high-throughput long-context inference |
Jian Chen, Vashisth Tiwari, Ranajoy Sadhukhan, Yunho Jin, Zhuoming Chen, Jinyuan Shi, Ian En-Hsu Yen, Avner May, Beidi Chen |
Sep. 05, 2024 |
2002 |
2 |
BitDelta: Your Fine-Tune May Only Be Worth One Bit |
James Liu, Guangxuan Xiao, Kai Li, Jason D. Lee, Song Han, Tri Dao, Tianle Cai |
Feb. 20, 2024 |
1690 |
- |
Hyena Hierarchy: Towards larger convolutional language models |
Michael Poli, Stefano Massaroli, Eric Nguyen, Daniel Y. Fu, Tri Dao, Stephen Baccus, Yoshua Bengio, Stefano Ermon, Christopher Ré |
Feb. 02, 2023 |
291 |
- |
BASED: Simple linear attention language models balance the recall-throughput tradeoff |
Simran, Sabri, Michael, Aman, Silas, Dylan, James, Atri, Chris |
Mar. 04, 2024 |
2303 |
165 |
Fine-tuning language models over slow networks using activation compression with guarantees |
Jue Wang, Binhang Yuan, Luka Rimanic, Yongjun He, Tri Dao, Beidi Chen, Christopher Re, Ce Zhang |
Jun. 02, 2023 |
336 |
- |
HELM: benchmarking large language models on the Together Research Computer |
Together |
Nov. 17, 2022 |
1045 |
- |
RedPajama training progress at 440 billion tokens |
Together |
Apr. 24, 2023 |
1090 |
- |
RedPajama, a project to create leading open-source models, starts by reproducing LLaMA training dataset of over 1.2 trillion tokens |
Together |
Apr. 17, 2023 |
1032 |
- |
FAQ: Building LLMs with RedPajama-v2, a 30 trillion token web dataset |
Together AI |
May. 01, 2024 |
2248 |
- |
FlexGen: High-throughput generative inference of large language models with a single GPU |
Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Daniel Y. Fu, Zhiqiang Xie, Beidi Chen, Clark Barrett, Joseph E. Gonzalez, Percy Liang, Christopher Ré, Ion Stoica, Ce Zhang |
Mar. 13, 2023 |
317 |
- |
Flash-Decoding for long-context inference |
Tri Dao, Daniel Haziza, Francisco Massa, Grigory Sizov |
Oct. 12, 2023 |
1271 |
- |
Together AI partners with Snowflake to bring Arctic LLM to Enterprise customers |
Together AI |
Apr. 25, 2024 |
422 |
- |
Decentralized training of foundation models in heterogeneous environments |
Binhang Yuan, Yongjun He, Jared Quincy Davis, Tianyi Zhang, Tri Dao, Beidi Chen, Percy Liang, Christopher Re, Ce Zhang |
Jun. 02, 2023 |
340 |
- |
Building your own RAG application using Together AI and MongoDB Atlas |
Together AI |
Jan. 11, 2024 |
1249 |
- |
Announcing Together Inference Engine 2.0 with new Turbo and Lite endpoints |
Together AI |
Jul. 18, 2024 |
1802 |
3 |
Using Axiomic to build multi agent chat with Together API |
Together AI |
Jun. 05, 2024 |
1169 |
- |
Monarch Mixer: A new model architecture for increased efficiency |
Dan Fu, Simran Arora, Chris Ré |
Jul. 25, 2023 |
1981 |
- |
Medusa: Simple framework for accelerating LLM generation with multiple decoding heads |
Tianle Cai*, Yuhong Li*, Zhengyang Geng, Hongwu Peng, Tri Dao (* Equal contribution) |
Sep. 11, 2023 |
2817 |
- |
OpenChatKit now runs on consumer GPUs with a new 7B parameter model |
Together |
Mar. 30, 2023 |
2310 |
- |
Announcing function calling and JSON mode |
Together AI |
Jan. 31, 2024 |
1861 |
- |
Together’s $20M seed funding to build open-source AI and cloud platform |
Vipul Ved Prakash |
May. 15, 2023 |
602 |
- |
Introducing The Together Enterprise Platform: Run GenAI securely in any environment, with 2x faster inference and continuous model optimization |
Together AI |
Sep. 23, 2024 |
1356 |
- |
Together AI launches Llama 3.2 APIs for vision, lightweight models & Llama Stack: powering rapid development of multimodal agentic apps |
Together AI |
Sep. 25, 2024 |
1482 |
1 |
FLUX API is now available on Together AI: New FLUX1.1 [pro] and free access to FLUX.1 [schnell] |
Together AI |
Oct. 03, 2024 |
694 |
1 |
Multimodal Document RAG with Llama 3.2 Vision and ColQwen2 |
Zain Hasan |
Oct. 08, 2024 |
1613 |
- |
How to build a real-time image generator with Flux and Together AI |
Hassan El Mghari |
Oct. 11, 2024 |
1197 |
- |
Linearizing LLMs with LoLCATs |
Michael Zhang, Simran Arora, Rahul Chalamala, Alan Wu, Benjamin Spector, Aaryan Singhal, Krithik Ramesh, Christopher Ré |
Oct. 14, 2024 |
2462 |
1 |
Even Better, Even Faster Quantized LLMs with QTIP |
Albert Tseng, Qingyao Sun, David Hou, Chris De Sa |
Oct. 30, 2024 |
3170 |
- |
Together AI to Co-Build Turbocharged NVIDIA GB200 Cluster with 36K Blackwell GPUs in Partnership with Hypertec Cloud |
Together AI |
Nov. 18, 2024 |
1230 |
- |