Baseten Blog - Plushcap

64 blog posts published by month since the start of 2024. Start from a different year: 2024
2021
2022
2023
2024
2025

Blog URL

www.baseten.co/blog

Posts year-to-date

10 (20 posts by this month last year.)

Average posts per month since 2024

2.7

Post details (2024 to today)

Title	Author	Date	Word count	HN points
Introducing Baseten Self-hosted	Anupreet Walia, Rachel Rapp	Aug 08, 2024	670	-
How to benchmark image generation models like Stable Diffusion XL	Philip Kiely	Jan 31, 2024	1374	-
Comparing tokens per second across LLMs	Philip Kiely	May 09, 2024	769	-
How latent consistency models work	Rachel Rapp	Jun 04, 2024	1140	-
Unlocking the full power of NVIDIA H100 GPUs for ML inference with TensorRT	Pankaj Gupta, Philip Kiely	Feb 06, 2024	1623	-
New in February 2024	Baseten	Feb 29, 2024	634	-
How to serve 10,000 fine-tuned LLMs from a single GPU	Pankaj Gupta, Philip Kiely	Jul 23, 2024	1895	-
Streaming real-time text to speech with XTTS V2	Het Trivedi, Philip Kiely	Apr 18, 2024	1318	-
Continuous vs dynamic batching for AI inference	Matt Howard, Philip Kiely	Apr 05, 2024	1350	-
High performance ML inference with NVIDIA TensorRT	Justin Yi, Philip Kiely	Mar 12, 2024	1076	-
FP8: Efficient model inference with 8-bit floating point numbers	Pankaj Gupta, Philip Kiely	Mar 07, 2024	1021	2
The best open source large language model	Philip Kiely	Feb 09, 2024	1920	-
New in January 2024	Baseten	Jan 31, 2024	580	-
Using fractional H100 GPUs for efficient model serving	Matt Howard, Vlad Shulman, Pankaj Gupta, Philip Kiely	Mar 28, 2024	1086	-
40% faster Stable Diffusion XL inference with NVIDIA TensorRT	Pankaj Gupta, Justin Yi, Philip Kiely	Feb 22, 2024	2403	-
Ten reasons to join Baseten	Dustin Michaels, Philip Kiely	Jul 25, 2024	1230	-
Why GPU utilization matters for model inference	Marius Killinger, Philip Kiely	Feb 20, 2024	816	-
New in March 2024	Baseten	Mar 28, 2024	553	-
Compound AI systems explained	Rachel Rapp	Aug 06, 2024	1338	-
What I learned as a forward-deployed engineer working at an AI startup	Het Trivedi	May 31, 2024	1353	-
Introducing Baseten Chains	Bola Malek, Marius Killinger, Sid Shanker, Rachel Rapp, Mike Bilodeau	Jun 27, 2024	1132	9
The benefits of globally distributed infrastructure for model serving	Phil Howes, Philip Kiely	Mar 01, 2024	603	-
33% faster LLM inference with FP8 quantization	Pankaj Gupta, Philip Kiely	Mar 14, 2024	1876	-
Using asynchronous inference in production	Samiksha Pal, Helen Yang, Rachel Rapp	Jul 11, 2024	950	-
Introduction to quantizing ML models	Abu Qader, Philip Kiely	Jan 31, 2024	1679	1
New in April 2024	Baseten	May 01, 2024	552	-
Benchmarking fast Mistral 7B inference	Abu Qader, Pankaj Gupta, Justin Yi, Philip Kiely	Mar 14, 2024	1571	-
SPC hackathon winners build with Llama 3.1 on Baseten	Philip Kiely	Aug 16, 2024	615	-
Understanding performance benchmarks for LLM inference	Philip Kiely	Jan 12, 2024	1459	-
Comparing few-step image generation models	Rachel Rapp	Jun 14, 2024	1087	-
Introducing automatic LLM optimization with TensorRT-LLM Engine Builder	Abu Qader, Philip Kiely	Aug 01, 2024	939	2
Deploying custom ComfyUI workflows as APIs	Het Trivedi, Rachel Rapp	Jul 25, 2024	1144	1
New in May 2024	Baseten	Jun 03, 2024	598	-
CI/CD for AI model deployments	Vlad Shulman, Samiksha Pal, Sid Shanker, Philip Kiely	Apr 30, 2024	914	-
Announcing our Series B	Tuhin Srivastava	Mar 04, 2024	629	2
Control plane vs workload plane in model serving infrastructure	Colin McGrath, Matt Howard, Philip Kiely	May 29, 2024	870	-
Baseten Chains explained: building multi-component AI workflows at scale	Marius Killinger, Rachel Rapp	Jul 02, 2024	2424	-
How to double tokens per second for Llama 3 with Medusa	Abu Qader, Philip Kiely	Aug 20, 2024	1462	2
The best open-source image generation model	Philip Kiely	Aug 29, 2024	1409	-
How to build function calling and JSON mode for open-source and fine-tuned LLMs	Bryce Dubayah, Philip Kiely	Sep 12, 2024	1339	1
Introducing function calling and structured output for open-source and fine-tuned LLMs	Bryce Dubayah, Philip Kiely	Sep 12, 2024	604	-
Building high-performance compound AI applications with MongoDB Atlas and Baseten	Philip Kiely	Sep 17, 2024	1425	-
Introducing Baseten Hybrid: control and flexibility in your cloud and ours	Mike Bilodeau, Rachel Rapp	Sep 26, 2024	633	-
Baseten partners with Google Cloud to deliver high-performance AI infrastructure to a broader audience	Mike Bilodeau, Rachel Rapp	Sep 26, 2024	688	-
Export your model inference metrics to your favorite observability tool	Helen Yang, Nicolas Gere-lamaysouette, Philip Kiely	Oct 05, 2024	493	-
Evaluating NVIDIA H200 GPUs for LLM inference	Pankaj Gupta, Philip Kiely	Oct 23, 2024	1294	-
Introducing canary deployments on Baseten	Sid Shanker, Jonathan Rochette, Raymond Cano, Rachel Rapp	Nov 01, 2024	932	-
Create custom environments for deployments on Baseten	Samiksha Pal, Raymond Cano, Sid Shanker, Rachel Rapp	Nov 15, 2024	621	-
Introducing Custom Servers: Deploy production-ready model servers from Docker images	Tianshu Cheng, Bola Malek, Rachel Rapp	Dec 09, 2024	807	-
Generally Available: The fastest, most accurate, and cost-efficient Whisper transcription	William Gao, Derrick Yang, Tianshu Cheng, Rachel Rapp	Dec 12, 2024	1145	-
A quick introduction to speculative decoding	Pankaj Gupta, Justin Yi, Philip Kiely	Dec 20, 2024	1139	-
Introducing our Speculative Decoding Engine Builder integration for ultra-low-latency LLM inference	Justin Yi, Abu Qader, Bryce Dubayah, Rachel Rapp	Dec 20, 2024	904	-
How we built production-ready speculative decoding with TensorRT-LLM	Pankaj Gupta, Justin Yi, Philip Kiely	Dec 20, 2024	2729	-
New observability features: activity logging, LLM metrics, and metrics dashboard customization	Suren Atoyan, Aaron Relph, Marius Killinger, Sid Shanker, Rachel Rapp	Dec 23, 2024	540	-
Driving model performance optimization: 2024 highlights	Pankaj Gupta	Jan 14, 2025	1530	-
Private, secure DeepSeek-R1 in production in US & EU data centers	Amir Haghighat, Philip Kiely	Feb 11, 2025	1274	-
Testing Llama 3.3 70B inference performance on NVIDIA GH200 in Lambda Cloud	Pankaj Gupta, Philip Kiely	Feb 11, 2025	1033	-
Baseten Chains is now GA for production compound AI systems	Marius Killinger, Tyron Jung, Rachel Rapp	Feb 12, 2025	1123	-
How multi-node inference works for massive LLMs like DeepSeek-R1	Phil Howes, Philip Kiely	Feb 15, 2025	1303	-
Announcing Baseten’s $75M Series C	Tuhin Srivastava	Feb 26, 2025	739	-
How we built high-throughput embedding, reranker, and classifier inference with TensorRT-LLM	Michael Feil, Philip Kiely	Mar 28, 2025	2035	-
Introducing Baseten Embeddings Inference: The fastest embeddings solution available	Michael Feil, Rachel Rapp	Mar 28, 2025	782	-
The best open-source embedding models	Philip Kiely	Apr 07, 2025	1254	-
Building performant embedding workflows with Chroma and Baseten	Philip Kiely	Apr 11, 2025	570	-

Baseten blog content

64 blog posts published by month since the start of 2024. Start from a different year: 202420212022202320242025

Post details (2024 to today)

64 blog posts published by month since the start of 2024. Start from a different year: 2024
2021
2022
2023
2024
2025