Baseten

Founded in 2019. Privately Held.

Inference platform for AI models.

Blog content published by word count

Blog content

post title	author	published	words	HN
New in October: Find community with The DSC	Baseten	Oct. 31, 2022	408	-
New in May 2022: Off-site but on-track	Baseten	May. 26, 2022	432	-
Introducing Baseten Self-hosted	Anupreet Walia, Rachel Rapp	Aug. 08, 2024	670	-
Four ML models that accelerate content creation	Philip Kiely	Jun. 02, 2022	945	-
New in December 2021	Emmiliese von Avis	Jan. 07, 2022	494	-
Deploying and using Stable Diffusion XL 1.0	Philip Kiely	Jul. 26, 2023	286	-
How to serve your ComfyUI model behind an API endpoint	Het Trivedi, Philip Kiely	Dec. 08, 2023	1326	-
New in July: A seamless bridge from model development to deployment	Baseten	Jul. 29, 2022	414	-
Baseten achieves SOC 2 Type II certification	Baseten	Mar. 08, 2023	282	-
New in January 2023	Baseten	Jan. 31, 2023	538	-
AudioGen: deploy and build today!	Jesse Mostipak	Aug. 04, 2023	340	-
Open source alternatives for machine learning models	Varun Shenoy, Philip Kiely	Nov. 21, 2023	1207	-
A guide to LLM inference and performance	Varun Shenoy, Philip Kiely	Nov. 17, 2023	3038	113
New in July 2023	Baseten	Aug. 02, 2023	514	-
Three techniques to adapt LLMs for any use case	Philip Kiely	Jun. 15, 2023	983	-
StartupML AMA: Nikhil Harithas	Derek Kim	Aug. 09, 2022	1774	-
New in June 2023	Baseten	Jun. 29, 2023	424	-
Build with OpenAI’s Whisper model in five minutes	Justin Yi	Oct. 18, 2022	712	-
Go from machine learning models to full-stack applications	Tuhin Srivastava	May. 03, 2022	1026	-
How we achieved SOC 2 and HIPAA compliance as an early-stage company	Baseten	Mar. 08, 2023	673	-
How to benchmark image generation models like Stable Diffusion XL	Philip Kiely	Jan. 31, 2024	1374	-
Comparing tokens per second across LLMs	Philip Kiely	May. 09, 2024	769	-
What I learned from my AI startup’s internal hackathon	Julien Reiman	Jun. 12, 2023	719	-
New in August: Deploy, deploy, deploy	Baseten	Aug. 31, 2022	430	-
How latent consistency models work	Rachel Rapp	Jun. 04, 2024	1140	-
New in August 2023	Baseten	Aug. 31, 2023	591	-
Comparing NVIDIA GPUs for AI: T4 vs A10	Philip Kiely	Apr. 27, 2023	1604	-
Unlocking the full power of NVIDIA H100 GPUs for ML inference with TensorRT	Pankaj Gupta, Philip Kiely	Feb. 06, 2024	1623	-
Deploy Falcon-40B on Baseten	Sid Shanker	Jun. 09, 2023	794	-
New in February 2024	Baseten	Feb. 29, 2024	634	-
StartupML AMA: Daniel Whitenack	Derek Kim	Aug. 30, 2022	1706	-
How to choose the right instance size for your ML models	Philip Kiely	Jan. 18, 2023	597	-
How to serve 10,000 fine-tuned LLMs from a single GPU	Pankaj Gupta, Philip Kiely	Jul. 23, 2024	1895	-
New in September 2023	Baseten	Sep. 29, 2023	605	-
Streaming real-time text to speech with XTTS V2	Het Trivedi, Philip Kiely	Apr. 18, 2024	1318	-
Continuous vs dynamic batching for AI inference	Matt Howard, Philip Kiely	Apr. 05, 2024	1350	-
Models We Love: June 2023	Baseten	Jul. 06, 2023	1498	-
High performance ML inference with NVIDIA TensorRT	Justin Yi, Philip Kiely	Mar. 12, 2024	1076	-
Why we built and open-sourced a model serving solution	Phil Howes	Aug. 05, 2022	1030	-
NVIDIA A10 vs A100 GPUs for LLM and Stable Diffusion inference	Philip Kiely	Sep. 15, 2023	1636	-
New in September: Increasing flexibility and robustness	Baseten	Sep. 29, 2022	461	-
Baseten achieves SOC 2 Type 1 certification	Baseten	Mar. 16, 2022	280	-
FP8: Efficient model inference with 8-bit floating point numbers	Pankaj Gupta, Philip Kiely	Mar. 07, 2024	1021	2
Deployment and inference for open source text embedding models	Philip Kiely	Nov. 02, 2023	1706	-
The best open source large language model	Philip Kiely	Feb. 09, 2024	1920	-
New in January 2024	Baseten	Jan. 31, 2024	580	-
How to deploy Stable Diffusion using Truss	Abu Qader	Sep. 01, 2022	1038	-
Deploy open-source models in a couple clicks from Baseten’s model library	Emmiliese von Avis	Jun. 08, 2023	888	-
Playground v2 vs Stable Diffusion XL 1.0 for text-to-image generation	Philip Kiely	Dec. 13, 2023	1075	-
Using fractional H100 GPUs for efficient model serving	Matt Howard, Vlad Shulman, Pankaj Gupta, Philip Kiely	Mar. 28, 2024	1086	-
New in November 2021	Emmiliese von Avis	Nov. 22, 2021	372	-
Jina AI’s jina-embeddings-v2: an open source text embedding model that matches OpenAI’s ada-002	Philip Kiely	Oct. 27, 2023	547	-
Accelerating model deployment: 100X faster dev loops with development deployments	Baseten	Dec. 08, 2022	810	-
40% faster Stable Diffusion XL inference with NVIDIA TensorRT	Pankaj Gupta, Justin Yi, Philip Kiely	Feb. 22, 2024	2403	-
New in June: Full-stack superpowers	Baseten	Jun. 30, 2022	463	-
Ten reasons to join Baseten	Dustin Michaels, Philip Kiely	Jul. 25, 2024	1230	-
Why GPU utilization matters for model inference	Marius Killinger, Philip Kiely	Feb. 20, 2024	816	-
New in March 2024	Baseten	Mar. 28, 2024	553	-
Build your own open-source ChatGPT with Llama 2 and Chainlit	Philip Kiely	Aug. 23, 2023	1061	-
Designing parental leave at an early stage startup	Paige Pauli	Feb. 02, 2022	844	-
SDXL inference in under 2 seconds: the ultimate guide to Stable Diffusion optimization	Varun Shenoy, Philip Kiely	Aug. 30, 2023	1352	-
A checklist for switching to open source ML models	Philip Kiely	Nov. 21, 2023	482	-
New in May 2023	Baseten	Jun. 02, 2023	384	-
Baseten announces HIPAA compliance	Baseten	Mar. 28, 2023	167	-
Compound AI systems explained	Rachel Rapp	Aug. 06, 2024	1338	-
What I learned as a forward-deployed engineer working at an AI startup	Het Trivedi	May. 31, 2024	1353	-
Introducing Baseten Chains	Bola Malek, Marius Killinger, Sid Shanker, Rachel Rapp, Mike Bilodeau	Jun. 27, 2024	1132	9
Introducing Baseten	Tuhin Srivastava	May. 20, 2021	1088	-
The benefits of globally distributed infrastructure for model serving	Phil Howes, Philip Kiely	Mar. 01, 2024	603	-
Technical deep dive: Truss live reload	Pankaj Gupta	Feb. 17, 2023	1852	-
33% faster LLM inference with FP8 quantization	Pankaj Gupta, Philip Kiely	Mar. 14, 2024	1876	-
Using asynchronous inference in production	Samiksha Pal, Helen Yang, Rachel Rapp	Jul. 11, 2024	950	-
Introduction to quantizing ML models	Abu Qader, Philip Kiely	Jan. 31, 2024	1679	1
Understanding NVIDIA’s Datacenter GPU line	Philip Kiely	May. 23, 2023	708	-
New in April 2024	Baseten	May. 01, 2024	552	-
Benchmarking fast Mistral 7B inference	Abu Qader, Pankaj Gupta, Justin Yi, Philip Kiely	Mar. 14, 2024	1571	-
Comparing GPUs across architectures and tiers	Philip Kiely	May. 22, 2023	765	-
SPC hackathon winners build with Llama 3.1 on Baseten	Philip Kiely	Aug. 16, 2024	615	-
Understanding performance benchmarks for LLM inference	Philip Kiely	Jan. 12, 2024	1459	-
New in December 2023	Baseten	Dec. 27, 2023	553	-
Pinning ML model revisions for compatibility and security	Philip Kiely	Nov. 09, 2023	564	-
Comparing few-step image generation models	Rachel Rapp	Jun. 14, 2024	1087	-
Choosing the right horizontal scaling setup for high-traffic models	Philip Kiely	Jan. 19, 2023	628	-
Models We Love: July 2023	Baseten	Jul. 26, 2023	1831	-
Faster Mixtral inference with TensorRT-LLM and quantization	Pankaj Gupta, Timur Abishev, Philip Kiely	Dec. 22, 2023	1467	2
NVIDIA A10 vs A10G for ML model inference	Philip Kiely	Nov. 28, 2023	1056	-
Stable Video Diffusion now available	Sid Shanker, Varun Shenoy	Nov. 22, 2023	324	-
Serving four million Riffusion requests in two days	Phil Howes	Dec. 21, 2022	757	-
Announcing our Series A	Tuhin Srivastava	Apr. 26, 2022	727	-
Create an API endpoint for an ML model	Philip Kiely	Apr. 22, 2022	339	-
New in October 2023	Baseten	Oct. 31, 2023	497	-
Introducing automatic LLM optimization with TensorRT-LLM Engine Builder	Abu Qader, Philip Kiely	Aug. 01, 2024	939	2
New in March 2023	Baseten	Mar. 31, 2023	359	-
Deploying custom ComfyUI workflows as APIs	Het Trivedi, Rachel Rapp	Jul. 25, 2024	1144	-
Deploy StableLM with Truss	Tuhin Srivastava	Apr. 20, 2023	423	-
Build a chatbot with Llama 2 and LangChain	Philip Kiely	Jul. 27, 2023	1440	-
Model autoscaling features on Baseten	Jesse Mostipak	Jul. 07, 2023	890	-
Part 1: Working at an early stage company as an early stage engineer	Samiksha Pal	Nov. 29, 2021	1538	-
GPT vs Mistral: Migrate to open source LLMs seamlessly	Sid Shanker, Philip Kiely	Nov. 22, 2023	879	-
New in May 2024	Baseten	Jun. 03, 2024	598	-
CI/CD for AI model deployments	Vlad Shulman, Samiksha Pal, Sid Shanker, Philip Kiely	Apr. 30, 2024	914	-
Getting started with foundation models	Jesse Mostipak	Jun. 06, 2023	1226	-
How Baseten is using "docs as code" to build best-in-class documentation	Philip Kiely	Mar. 09, 2022	1014	-
AI infrastructure: build vs. buy	Baseten	Jul. 28, 2023	1040	-
Announcing our Series B	Tuhin Srivastava	Mar. 04, 2024	629	2
New in December 2022	Baseten	Dec. 23, 2022	554	-
Control plane vs workload plane in model serving infrastructure	Colin McGrath, Matt Howard, Philip Kiely	May. 29, 2024	870	-
If You Build It, Devs will Come: How to Host an AI Meetup	Julien Reiman	Apr. 06, 2023	1061	-
New in November 2023	Baseten	Nov. 30, 2023	419	-
Baseten Chains explained: building multi-component AI workflows at scale	Marius Killinger, Rachel Rapp	Jul. 02, 2024	2424	-
New in April 2023	Baseten	Apr. 30, 2023	510	-
How to double tokens per second for Llama 3 with Medusa	Abu Qader, Philip Kiely	Aug. 20, 2024	1462	2
The best open-source image generation model	Philip Kiely	Aug. 29, 2024	1409	-
How to build function calling and JSON mode for open-source and fine-tuned LLMs	Bryce Dubayah, Philip Kiely	Sep. 12, 2024	1339	1
Introducing function calling and structured output for open-source and fine-tuned LLMs	Bryce Dubayah, Philip Kiely	Sep. 12, 2024	604	-
Building high-performance compound AI applications with MongoDB Atlas and Baseten	Philip Kiely	Sep. 17, 2024	1425	-
Introducing Baseten Hybrid: control and flexibility in your cloud and ours	Mike Bilodeau, Rachel Rapp	Sep. 26, 2024	633	-
Baseten partners with Google Cloud to deliver high-performance AI infrastructure to a broader audience	Mike Bilodeau, Rachel Rapp	Sep. 26, 2024	688	-
Export your model inference metrics to your favorite observability tool	Helen Yang, Nicolas Gere-lamaysouette, Philip Kiely	Oct. 05, 2024	493	-
Evaluating NVIDIA H200 GPUs for LLM inference	Pankaj Gupta, Philip Kiely	Oct. 23, 2024	1294	-
Introducing canary deployments on Baseten	Sid Shanker, Jonathan Rochette, Raymond Cano, Rachel Rapp	Nov. 01, 2024	932	-
Create custom environments for deployments on Baseten	Samiksha Pal, Raymond Cano, Sid Shanker, Rachel Rapp	Nov. 15, 2024	621	-