Baseten Blog - Plushcap

111 blog posts published by month since the start of 2023. Start from a different year: 2023
2021
2022
2023
2024
2025

Blog URL

www.baseten.co/blog

Posts year-to-date

8 (20 posts by this month last year.)

Average posts per month since 2023

3.1

Post details (2023 to today)

Title	Author	Date	Word count	HN points
Introducing Baseten Self-hosted	Anupreet Walia, Rachel Rapp	Aug 08, 2024	670	-
Deploying and using Stable Diffusion XL 1.0	Philip Kiely	Jul 26, 2023	286	-
How to serve your ComfyUI model behind an API endpoint	Het Trivedi, Philip Kiely	Dec 08, 2023	1326	-
Baseten achieves SOC 2 Type II certification	Baseten	Mar 08, 2023	282	-
New in January 2023	Baseten	Jan 31, 2023	538	-
AudioGen: deploy and build today!	Jesse Mostipak	Aug 04, 2023	340	-
Open source alternatives for machine learning models	Varun Shenoy, Philip Kiely	Nov 21, 2023	1207	-
A guide to LLM inference and performance	Varun Shenoy, Philip Kiely	Nov 17, 2023	3038	113
New in July 2023	Baseten	Aug 02, 2023	514	-
Three techniques to adapt LLMs for any use case	Philip Kiely	Jun 15, 2023	983	-
New in June 2023	Baseten	Jun 29, 2023	424	-
How we achieved SOC 2 and HIPAA compliance as an early-stage company	Baseten	Mar 08, 2023	673	-
How to benchmark image generation models like Stable Diffusion XL	Philip Kiely	Jan 31, 2024	1374	-
Comparing tokens per second across LLMs	Philip Kiely	May 09, 2024	769	-
What I learned from my AI startup’s internal hackathon	Julien Reiman	Jun 12, 2023	719	-
How latent consistency models work	Rachel Rapp	Jun 04, 2024	1140	-
New in August 2023	Baseten	Aug 31, 2023	591	-
Comparing NVIDIA GPUs for AI: T4 vs A10	Philip Kiely	Apr 27, 2023	1604	-
Unlocking the full power of NVIDIA H100 GPUs for ML inference with TensorRT	Pankaj Gupta, Philip Kiely	Feb 06, 2024	1623	-
Deploy Falcon-40B on Baseten	Sid Shanker	Jun 09, 2023	794	-
New in February 2024	Baseten	Feb 29, 2024	634	-
How to choose the right instance size for your ML models	Philip Kiely	Jan 18, 2023	597	-
How to serve 10,000 fine-tuned LLMs from a single GPU	Pankaj Gupta, Philip Kiely	Jul 23, 2024	1895	-
New in September 2023	Baseten	Sep 29, 2023	605	-
Streaming real-time text to speech with XTTS V2	Het Trivedi, Philip Kiely	Apr 18, 2024	1318	-
Continuous vs dynamic batching for AI inference	Matt Howard, Philip Kiely	Apr 05, 2024	1350	-
Models We Love: June 2023	Baseten	Jul 06, 2023	1498	-
High performance ML inference with NVIDIA TensorRT	Justin Yi, Philip Kiely	Mar 12, 2024	1076	-
NVIDIA A10 vs A100 GPUs for LLM and Stable Diffusion inference	Philip Kiely	Sep 15, 2023	1636	-
FP8: Efficient model inference with 8-bit floating point numbers	Pankaj Gupta, Philip Kiely	Mar 07, 2024	1021	2
Deployment and inference for open source text embedding models	Philip Kiely	Nov 02, 2023	1706	-
The best open source large language model	Philip Kiely	Feb 09, 2024	1920	-
New in January 2024	Baseten	Jan 31, 2024	580	-
Deploy open-source models in a couple clicks from Baseten’s model library	Emmiliese von Avis	Jun 08, 2023	888	-
Playground v2 vs Stable Diffusion XL 1.0 for text-to-image generation	Philip Kiely	Dec 13, 2023	1075	-
Using fractional H100 GPUs for efficient model serving	Matt Howard, Vlad Shulman, Pankaj Gupta, Philip Kiely	Mar 28, 2024	1086	-
Jina AI’s jina-embeddings-v2: an open source text embedding model that matches OpenAI’s ada-002	Philip Kiely	Oct 27, 2023	547	-
40% faster Stable Diffusion XL inference with NVIDIA TensorRT	Pankaj Gupta, Justin Yi, Philip Kiely	Feb 22, 2024	2403	-
Ten reasons to join Baseten	Dustin Michaels, Philip Kiely	Jul 25, 2024	1230	-
Why GPU utilization matters for model inference	Marius Killinger, Philip Kiely	Feb 20, 2024	816	-
New in March 2024	Baseten	Mar 28, 2024	553	-
Build your own open-source ChatGPT with Llama 2 and Chainlit	Philip Kiely	Aug 23, 2023	1061	-
SDXL inference in under 2 seconds: the ultimate guide to Stable Diffusion optimization	Varun Shenoy, Philip Kiely	Aug 30, 2023	1352	-
A checklist for switching to open source ML models	Philip Kiely	Nov 21, 2023	482	-
New in May 2023	Baseten	Jun 02, 2023	384	-
Baseten announces HIPAA compliance	Baseten	Mar 28, 2023	167	-
Compound AI systems explained	Rachel Rapp	Aug 06, 2024	1338	-
What I learned as a forward-deployed engineer working at an AI startup	Het Trivedi	May 31, 2024	1353	-
Introducing Baseten Chains	Bola Malek, Marius Killinger, Sid Shanker, Rachel Rapp, Mike Bilodeau	Jun 27, 2024	1132	9
The benefits of globally distributed infrastructure for model serving	Phil Howes, Philip Kiely	Mar 01, 2024	603	-
Technical deep dive: Truss live reload	Pankaj Gupta	Feb 17, 2023	1852	-
33% faster LLM inference with FP8 quantization	Pankaj Gupta, Philip Kiely	Mar 14, 2024	1876	-
Using asynchronous inference in production	Samiksha Pal, Helen Yang, Rachel Rapp	Jul 11, 2024	950	-
Introduction to quantizing ML models	Abu Qader, Philip Kiely	Jan 31, 2024	1679	1
Understanding NVIDIA’s Datacenter GPU line	Philip Kiely	May 23, 2023	708	-
New in April 2024	Baseten	May 01, 2024	552	-
Benchmarking fast Mistral 7B inference	Abu Qader, Pankaj Gupta, Justin Yi, Philip Kiely	Mar 14, 2024	1571	-
Comparing GPUs across architectures and tiers	Philip Kiely	May 22, 2023	765	-
SPC hackathon winners build with Llama 3.1 on Baseten	Philip Kiely	Aug 16, 2024	615	-
Understanding performance benchmarks for LLM inference	Philip Kiely	Jan 12, 2024	1459	-
New in December 2023	Baseten	Dec 27, 2023	553	-
Pinning ML model revisions for compatibility and security	Philip Kiely	Nov 09, 2023	564	-
Comparing few-step image generation models	Rachel Rapp	Jun 14, 2024	1087	-
Choosing the right horizontal scaling setup for high-traffic models	Philip Kiely	Jan 19, 2023	628	-
Models We Love: July 2023	Baseten	Jul 26, 2023	1831	-
Faster Mixtral inference with TensorRT-LLM and quantization	Pankaj Gupta, Timur Abishev, Philip Kiely	Dec 22, 2023	1467	2
NVIDIA A10 vs A10G for ML model inference	Philip Kiely	Nov 28, 2023	1056	-
Stable Video Diffusion now available	Sid Shanker, Varun Shenoy	Nov 22, 2023	324	-
New in October 2023	Baseten	Oct 31, 2023	497	-
Introducing automatic LLM optimization with TensorRT-LLM Engine Builder	Abu Qader, Philip Kiely	Aug 01, 2024	939	2
New in March 2023	Baseten	Mar 31, 2023	359	-
Deploying custom ComfyUI workflows as APIs	Het Trivedi, Rachel Rapp	Jul 25, 2024	1144	1
Deploy StableLM with Truss	Tuhin Srivastava	Apr 20, 2023	423	-
Build a chatbot with Llama 2 and LangChain	Philip Kiely	Jul 27, 2023	1440	-
Model autoscaling features on Baseten	Jesse Mostipak	Jul 07, 2023	890	-
GPT vs Mistral: Migrate to open source LLMs seamlessly	Sid Shanker, Philip Kiely	Nov 22, 2023	879	-
New in May 2024	Baseten	Jun 03, 2024	598	-
CI/CD for AI model deployments	Vlad Shulman, Samiksha Pal, Sid Shanker, Philip Kiely	Apr 30, 2024	914	-
Getting started with foundation models	Jesse Mostipak	Jun 06, 2023	1226	-
AI infrastructure: build vs. buy	Baseten	Jul 28, 2023	1040	-
Announcing our Series B	Tuhin Srivastava	Mar 04, 2024	629	2
Control plane vs workload plane in model serving infrastructure	Colin McGrath, Matt Howard, Philip Kiely	May 29, 2024	870	-
If You Build It, Devs will Come: How to Host an AI Meetup	Julien Reiman	Apr 06, 2023	1061	-
New in November 2023	Baseten	Nov 30, 2023	419	-
Baseten Chains explained: building multi-component AI workflows at scale	Marius Killinger, Rachel Rapp	Jul 02, 2024	2424	-
New in April 2023	Baseten	Apr 30, 2023	510	-
How to double tokens per second for Llama 3 with Medusa	Abu Qader, Philip Kiely	Aug 20, 2024	1462	2
The best open-source image generation model	Philip Kiely	Aug 29, 2024	1409	-
How to build function calling and JSON mode for open-source and fine-tuned LLMs	Bryce Dubayah, Philip Kiely	Sep 12, 2024	1339	1
Introducing function calling and structured output for open-source and fine-tuned LLMs	Bryce Dubayah, Philip Kiely	Sep 12, 2024	604	-
Building high-performance compound AI applications with MongoDB Atlas and Baseten	Philip Kiely	Sep 17, 2024	1425	-
Introducing Baseten Hybrid: control and flexibility in your cloud and ours	Mike Bilodeau, Rachel Rapp	Sep 26, 2024	633	-
Baseten partners with Google Cloud to deliver high-performance AI infrastructure to a broader audience	Mike Bilodeau, Rachel Rapp	Sep 26, 2024	688	-
Export your model inference metrics to your favorite observability tool	Helen Yang, Nicolas Gere-lamaysouette, Philip Kiely	Oct 05, 2024	493	-
Evaluating NVIDIA H200 GPUs for LLM inference	Pankaj Gupta, Philip Kiely	Oct 23, 2024	1294	-
Introducing canary deployments on Baseten	Sid Shanker, Jonathan Rochette, Raymond Cano, Rachel Rapp	Nov 01, 2024	932	-
Create custom environments for deployments on Baseten	Samiksha Pal, Raymond Cano, Sid Shanker, Rachel Rapp	Nov 15, 2024	621	-
Introducing Custom Servers: Deploy production-ready model servers from Docker images	Tianshu Cheng, Bola Malek, Rachel Rapp	Dec 09, 2024	807	-
Generally Available: The fastest, most accurate, and cost-efficient Whisper transcription	William Gao, Derrick Yang, Tianshu Cheng, Rachel Rapp	Dec 12, 2024	1145	-
A quick introduction to speculative decoding	Pankaj Gupta, Justin Yi, Philip Kiely	Dec 20, 2024	1139	-
Introducing our Speculative Decoding Engine Builder integration for ultra-low-latency LLM inference	Justin Yi, Abu Qader, Bryce Dubayah, Rachel Rapp	Dec 20, 2024	904	-
How we built production-ready speculative decoding with TensorRT-LLM	Pankaj Gupta, Justin Yi, Philip Kiely	Dec 20, 2024	2729	-
New observability features: activity logging, LLM metrics, and metrics dashboard customization	Suren Atoyan, Aaron Relph, Marius Killinger, Sid Shanker, Rachel Rapp	Dec 23, 2024	540	-
Driving model performance optimization: 2024 highlights	Pankaj Gupta	Jan 14, 2025	1530	-
Private, secure DeepSeek-R1 in production in US & EU data centers	Amir Haghighat, Philip Kiely	Feb 11, 2025	1274	-
Testing Llama 3.3 70B inference performance on NVIDIA GH200 in Lambda Cloud	Pankaj Gupta, Philip Kiely	Feb 11, 2025	1033	-
Baseten Chains is now GA for production compound AI systems	Marius Killinger, Tyron Jung, Rachel Rapp	Feb 12, 2025	1123	-
How multi-node inference works for massive LLMs like DeepSeek-R1	Phil Howes, Philip Kiely	Feb 15, 2025	1303	-
Announcing Baseten’s $75M Series C	Tuhin Srivastava	Feb 26, 2025	739	-
How we built high-throughput embedding, reranker, and classifier inference with TensorRT-LLM	Michael Feil, Philip Kiely	Mar 28, 2025	2035	-
Introducing Baseten Embeddings Inference: The fastest embeddings solution available	Michael Feil, Rachel Rapp	Mar 28, 2025	782	-

Baseten blog content

111 blog posts published by month since the start of 2023. Start from a different year: 202320212022202320242025

Post details (2023 to today)

111 blog posts published by month since the start of 2023. Start from a different year: 2023
2021
2022
2023
2024
2025