Baseten Blog - Plushcap

138 blog posts published by month since the start of 2021. Start from a different year: 2021
2021
2022
2023
2024
2025

Blog URL

www.baseten.co/blog

Posts year-to-date

10 (20 posts by this month last year.)

Average posts per month since 2021

2.3

Post details (2021 to today)

Title	Author	Date	Word count	HN points
New in October: Find community with The DSC	Baseten	Oct 31, 2022	408	-
New in May 2022: Off-site but on-track	Baseten	May 26, 2022	432	-
Introducing Baseten Self-hosted	Anupreet Walia, Rachel Rapp	Aug 08, 2024	670	-
Four ML models that accelerate content creation	Philip Kiely	Jun 02, 2022	945	-
New in December 2021	Emmiliese von Avis	Jan 07, 2022	494	-
Deploying and using Stable Diffusion XL 1.0	Philip Kiely	Jul 26, 2023	286	-
How to serve your ComfyUI model behind an API endpoint	Het Trivedi, Philip Kiely	Dec 08, 2023	1326	-
New in July: A seamless bridge from model development to deployment	Baseten	Jul 29, 2022	414	-
Baseten achieves SOC 2 Type II certification	Baseten	Mar 08, 2023	282	-
New in January 2023	Baseten	Jan 31, 2023	538	-
AudioGen: deploy and build today!	Jesse Mostipak	Aug 04, 2023	340	-
Open source alternatives for machine learning models	Varun Shenoy, Philip Kiely	Nov 21, 2023	1207	-
A guide to LLM inference and performance	Varun Shenoy, Philip Kiely	Nov 17, 2023	3038	113
New in July 2023	Baseten	Aug 02, 2023	514	-
Three techniques to adapt LLMs for any use case	Philip Kiely	Jun 15, 2023	983	-
StartupML AMA: Nikhil Harithas	Derek Kim	Aug 09, 2022	1774	-
New in June 2023	Baseten	Jun 29, 2023	424	-
Build with OpenAI’s Whisper model in five minutes	Justin Yi	Oct 18, 2022	712	-
Go from machine learning models to full-stack applications	Tuhin Srivastava	May 03, 2022	1026	-
How we achieved SOC 2 and HIPAA compliance as an early-stage company	Baseten	Mar 08, 2023	673	-
How to benchmark image generation models like Stable Diffusion XL	Philip Kiely	Jan 31, 2024	1374	-
Comparing tokens per second across LLMs	Philip Kiely	May 09, 2024	769	-
What I learned from my AI startup’s internal hackathon	Julien Reiman	Jun 12, 2023	719	-
New in August: Deploy, deploy, deploy	Baseten	Aug 31, 2022	430	-
How latent consistency models work	Rachel Rapp	Jun 04, 2024	1140	-
New in August 2023	Baseten	Aug 31, 2023	591	-
Comparing NVIDIA GPUs for AI: T4 vs A10	Philip Kiely	Apr 27, 2023	1604	-
Unlocking the full power of NVIDIA H100 GPUs for ML inference with TensorRT	Pankaj Gupta, Philip Kiely	Feb 06, 2024	1623	-
Deploy Falcon-40B on Baseten	Sid Shanker	Jun 09, 2023	794	-
New in February 2024	Baseten	Feb 29, 2024	634	-
StartupML AMA: Daniel Whitenack	Derek Kim	Aug 30, 2022	1706	-
How to choose the right instance size for your ML models	Philip Kiely	Jan 18, 2023	597	-
How to serve 10,000 fine-tuned LLMs from a single GPU	Pankaj Gupta, Philip Kiely	Jul 23, 2024	1895	-
New in September 2023	Baseten	Sep 29, 2023	605	-
Streaming real-time text to speech with XTTS V2	Het Trivedi, Philip Kiely	Apr 18, 2024	1318	-
Continuous vs dynamic batching for AI inference	Matt Howard, Philip Kiely	Apr 05, 2024	1350	-
Models We Love: June 2023	Baseten	Jul 06, 2023	1498	-
High performance ML inference with NVIDIA TensorRT	Justin Yi, Philip Kiely	Mar 12, 2024	1076	-
Why we built and open-sourced a model serving solution	Phil Howes	Aug 05, 2022	1030	-
NVIDIA A10 vs A100 GPUs for LLM and Stable Diffusion inference	Philip Kiely	Sep 15, 2023	1636	-
New in September: Increasing flexibility and robustness	Baseten	Sep 29, 2022	461	-
Baseten achieves SOC 2 Type 1 certification	Baseten	Mar 16, 2022	280	-
FP8: Efficient model inference with 8-bit floating point numbers	Pankaj Gupta, Philip Kiely	Mar 07, 2024	1021	2
Deployment and inference for open source text embedding models	Philip Kiely	Nov 02, 2023	1706	-
The best open source large language model	Philip Kiely	Feb 09, 2024	1920	-
New in January 2024	Baseten	Jan 31, 2024	580	-
How to deploy Stable Diffusion using Truss	Abu Qader	Sep 01, 2022	1038	-
Deploy open-source models in a couple clicks from Baseten’s model library	Emmiliese von Avis	Jun 08, 2023	888	-
Playground v2 vs Stable Diffusion XL 1.0 for text-to-image generation	Philip Kiely	Dec 13, 2023	1075	-
Using fractional H100 GPUs for efficient model serving	Matt Howard, Vlad Shulman, Pankaj Gupta, Philip Kiely	Mar 28, 2024	1086	-
New in November 2021	Emmiliese von Avis	Nov 22, 2021	372	-
Jina AI’s jina-embeddings-v2: an open source text embedding model that matches OpenAI’s ada-002	Philip Kiely	Oct 27, 2023	547	-
Accelerating model deployment: 100X faster dev loops with development deployments	Baseten	Dec 08, 2022	810	-
40% faster Stable Diffusion XL inference with NVIDIA TensorRT	Pankaj Gupta, Justin Yi, Philip Kiely	Feb 22, 2024	2403	-
New in June: Full-stack superpowers	Baseten	Jun 30, 2022	463	-
Ten reasons to join Baseten	Dustin Michaels, Philip Kiely	Jul 25, 2024	1230	-
Why GPU utilization matters for model inference	Marius Killinger, Philip Kiely	Feb 20, 2024	816	-
New in March 2024	Baseten	Mar 28, 2024	553	-
Build your own open-source ChatGPT with Llama 2 and Chainlit	Philip Kiely	Aug 23, 2023	1061	-
Designing parental leave at an early stage startup	Paige Pauli	Feb 02, 2022	844	-
SDXL inference in under 2 seconds: the ultimate guide to Stable Diffusion optimization	Varun Shenoy, Philip Kiely	Aug 30, 2023	1352	-
A checklist for switching to open source ML models	Philip Kiely	Nov 21, 2023	482	-
New in May 2023	Baseten	Jun 02, 2023	384	-
Baseten announces HIPAA compliance	Baseten	Mar 28, 2023	167	-
Compound AI systems explained	Rachel Rapp	Aug 06, 2024	1338	-
What I learned as a forward-deployed engineer working at an AI startup	Het Trivedi	May 31, 2024	1353	-
Introducing Baseten Chains	Bola Malek, Marius Killinger, Sid Shanker, Rachel Rapp, Mike Bilodeau	Jun 27, 2024	1132	9
Introducing Baseten	Tuhin Srivastava	May 20, 2021	1088	-
The benefits of globally distributed infrastructure for model serving	Phil Howes, Philip Kiely	Mar 01, 2024	603	-
Technical deep dive: Truss live reload	Pankaj Gupta	Feb 17, 2023	1852	-
33% faster LLM inference with FP8 quantization	Pankaj Gupta, Philip Kiely	Mar 14, 2024	1876	-
Using asynchronous inference in production	Samiksha Pal, Helen Yang, Rachel Rapp	Jul 11, 2024	950	-
Introduction to quantizing ML models	Abu Qader, Philip Kiely	Jan 31, 2024	1679	1
Understanding NVIDIA’s Datacenter GPU line	Philip Kiely	May 23, 2023	708	-
New in April 2024	Baseten	May 01, 2024	552	-
Benchmarking fast Mistral 7B inference	Abu Qader, Pankaj Gupta, Justin Yi, Philip Kiely	Mar 14, 2024	1571	-
Comparing GPUs across architectures and tiers	Philip Kiely	May 22, 2023	765	-
SPC hackathon winners build with Llama 3.1 on Baseten	Philip Kiely	Aug 16, 2024	615	-
Understanding performance benchmarks for LLM inference	Philip Kiely	Jan 12, 2024	1459	-
New in December 2023	Baseten	Dec 27, 2023	553	-
Pinning ML model revisions for compatibility and security	Philip Kiely	Nov 09, 2023	564	-
Comparing few-step image generation models	Rachel Rapp	Jun 14, 2024	1087	-
Choosing the right horizontal scaling setup for high-traffic models	Philip Kiely	Jan 19, 2023	628	-
Models We Love: July 2023	Baseten	Jul 26, 2023	1831	-
Faster Mixtral inference with TensorRT-LLM and quantization	Pankaj Gupta, Timur Abishev, Philip Kiely	Dec 22, 2023	1467	2
NVIDIA A10 vs A10G for ML model inference	Philip Kiely	Nov 28, 2023	1056	-
Stable Video Diffusion now available	Sid Shanker, Varun Shenoy	Nov 22, 2023	324	-
Serving four million Riffusion requests in two days	Phil Howes	Dec 21, 2022	757	-
Announcing our Series A	Tuhin Srivastava	Apr 26, 2022	727	-
Create an API endpoint for an ML model	Philip Kiely	Apr 22, 2022	339	-
New in October 2023	Baseten	Oct 31, 2023	497	-
Introducing automatic LLM optimization with TensorRT-LLM Engine Builder	Abu Qader, Philip Kiely	Aug 01, 2024	939	2
New in March 2023	Baseten	Mar 31, 2023	359	-
Deploying custom ComfyUI workflows as APIs	Het Trivedi, Rachel Rapp	Jul 25, 2024	1144	1
Deploy StableLM with Truss	Tuhin Srivastava	Apr 20, 2023	423	-
Build a chatbot with Llama 2 and LangChain	Philip Kiely	Jul 27, 2023	1440	-
Model autoscaling features on Baseten	Jesse Mostipak	Jul 07, 2023	890	-
Part 1: Working at an early stage company as an early stage engineer	Samiksha Pal	Nov 29, 2021	1538	-
GPT vs Mistral: Migrate to open source LLMs seamlessly	Sid Shanker, Philip Kiely	Nov 22, 2023	879	-
New in May 2024	Baseten	Jun 03, 2024	598	-
CI/CD for AI model deployments	Vlad Shulman, Samiksha Pal, Sid Shanker, Philip Kiely	Apr 30, 2024	914	-
Getting started with foundation models	Jesse Mostipak	Jun 06, 2023	1226	-
How Baseten is using "docs as code" to build best-in-class documentation	Philip Kiely	Mar 09, 2022	1014	-
AI infrastructure: build vs. buy	Baseten	Jul 28, 2023	1040	-
Announcing our Series B	Tuhin Srivastava	Mar 04, 2024	629	2
New in December 2022	Baseten	Dec 23, 2022	554	-
Control plane vs workload plane in model serving infrastructure	Colin McGrath, Matt Howard, Philip Kiely	May 29, 2024	870	-
If You Build It, Devs will Come: How to Host an AI Meetup	Julien Reiman	Apr 06, 2023	1061	-
New in November 2023	Baseten	Nov 30, 2023	419	-
Baseten Chains explained: building multi-component AI workflows at scale	Marius Killinger, Rachel Rapp	Jul 02, 2024	2424	-
New in April 2023	Baseten	Apr 30, 2023	510	-
How to double tokens per second for Llama 3 with Medusa	Abu Qader, Philip Kiely	Aug 20, 2024	1462	2
The best open-source image generation model	Philip Kiely	Aug 29, 2024	1409	-
How to build function calling and JSON mode for open-source and fine-tuned LLMs	Bryce Dubayah, Philip Kiely	Sep 12, 2024	1339	1
Introducing function calling and structured output for open-source and fine-tuned LLMs	Bryce Dubayah, Philip Kiely	Sep 12, 2024	604	-
Building high-performance compound AI applications with MongoDB Atlas and Baseten	Philip Kiely	Sep 17, 2024	1425	-
Introducing Baseten Hybrid: control and flexibility in your cloud and ours	Mike Bilodeau, Rachel Rapp	Sep 26, 2024	633	-
Baseten partners with Google Cloud to deliver high-performance AI infrastructure to a broader audience	Mike Bilodeau, Rachel Rapp	Sep 26, 2024	688	-
Export your model inference metrics to your favorite observability tool	Helen Yang, Nicolas Gere-lamaysouette, Philip Kiely	Oct 05, 2024	493	-
Evaluating NVIDIA H200 GPUs for LLM inference	Pankaj Gupta, Philip Kiely	Oct 23, 2024	1294	-
Introducing canary deployments on Baseten	Sid Shanker, Jonathan Rochette, Raymond Cano, Rachel Rapp	Nov 01, 2024	932	-
Create custom environments for deployments on Baseten	Samiksha Pal, Raymond Cano, Sid Shanker, Rachel Rapp	Nov 15, 2024	621	-
Introducing Custom Servers: Deploy production-ready model servers from Docker images	Tianshu Cheng, Bola Malek, Rachel Rapp	Dec 09, 2024	807	-
Generally Available: The fastest, most accurate, and cost-efficient Whisper transcription	William Gao, Derrick Yang, Tianshu Cheng, Rachel Rapp	Dec 12, 2024	1145	-
A quick introduction to speculative decoding	Pankaj Gupta, Justin Yi, Philip Kiely	Dec 20, 2024	1139	-
Introducing our Speculative Decoding Engine Builder integration for ultra-low-latency LLM inference	Justin Yi, Abu Qader, Bryce Dubayah, Rachel Rapp	Dec 20, 2024	904	-
How we built production-ready speculative decoding with TensorRT-LLM	Pankaj Gupta, Justin Yi, Philip Kiely	Dec 20, 2024	2729	-
New observability features: activity logging, LLM metrics, and metrics dashboard customization	Suren Atoyan, Aaron Relph, Marius Killinger, Sid Shanker, Rachel Rapp	Dec 23, 2024	540	-
Driving model performance optimization: 2024 highlights	Pankaj Gupta	Jan 14, 2025	1530	-
Private, secure DeepSeek-R1 in production in US & EU data centers	Amir Haghighat, Philip Kiely	Feb 11, 2025	1274	-
Testing Llama 3.3 70B inference performance on NVIDIA GH200 in Lambda Cloud	Pankaj Gupta, Philip Kiely	Feb 11, 2025	1033	-
Baseten Chains is now GA for production compound AI systems	Marius Killinger, Tyron Jung, Rachel Rapp	Feb 12, 2025	1123	-
How multi-node inference works for massive LLMs like DeepSeek-R1	Phil Howes, Philip Kiely	Feb 15, 2025	1303	-
Announcing Baseten’s $75M Series C	Tuhin Srivastava	Feb 26, 2025	739	-
How we built high-throughput embedding, reranker, and classifier inference with TensorRT-LLM	Michael Feil, Philip Kiely	Mar 28, 2025	2035	-
Introducing Baseten Embeddings Inference: The fastest embeddings solution available	Michael Feil, Rachel Rapp	Mar 28, 2025	782	-
The best open-source embedding models	Philip Kiely	Apr 07, 2025	1254	-
Building performant embedding workflows with Chroma and Baseten	Philip Kiely	Apr 11, 2025	570	-

Baseten blog content

138 blog posts published by month since the start of 2021. Start from a different year: 202120212022202320242025

Post details (2021 to today)

138 blog posts published by month since the start of 2021. Start from a different year: 2021
2021
2022
2023
2024
2025