Driving model performance optimization: 2024 highlights |
Pankaj Gupta |
Jan 14, 2025 |
1530 |
- |
Private, secure DeepSeek-R1 in production in US & EU data centers |
Amir Haghighat, Philip Kiely |
Feb 11, 2025 |
1274 |
- |
Testing Llama 3.3 70B inference performance on NVIDIA GH200 in Lambda Cloud |
Pankaj Gupta, Philip Kiely |
Feb 11, 2025 |
1033 |
- |
Baseten Chains is now GA for production compound AI systems |
Marius Killinger, Tyron Jung, Rachel Rapp |
Feb 12, 2025 |
1123 |
- |
How multi-node inference works for massive LLMs like DeepSeek-R1 |
Phil Howes, Philip Kiely |
Feb 15, 2025 |
1303 |
- |
Announcing Baseten’s $75M Series C |
Tuhin Srivastava |
Feb 26, 2025 |
739 |
- |
How we built high-throughput embedding, reranker, and classifier inference with TensorRT-LLM |
Michael Feil, Philip Kiely |
Mar 28, 2025 |
2035 |
- |
Introducing Baseten Embeddings Inference: The fastest embeddings solution available |
Michael Feil, Rachel Rapp |
Mar 28, 2025 |
782 |
- |