Baseten

Founded in 2019. Privately Held.

External links: homepage | docs | blog | jobs | youtube | twitter | github | linkedin

Inference platform for AI models.

Blog posts published by month since the start of

97 total blog posts published.

Switch to word count

Blog content

post title author published words HN
Introducing Baseten Self-hosted Anupreet Walia, Rachel Rapp Aug. 08, 2024 670 -
Deploying and using Stable Diffusion XL 1.0 Philip Kiely Jul. 26, 2023 286 -
How to serve your ComfyUI model behind an API endpoint Het Trivedi, Philip Kiely Dec. 08, 2023 1326 -
Baseten achieves SOC 2 Type II certification Baseten Mar. 08, 2023 282 -
New in January 2023 Baseten Jan. 31, 2023 538 -
AudioGen: deploy and build today! Jesse Mostipak Aug. 04, 2023 340 -
Open source alternatives for machine learning models Varun Shenoy, Philip Kiely Nov. 21, 2023 1207 -
A guide to LLM inference and performance Varun Shenoy, Philip Kiely Nov. 17, 2023 3038 113
New in July 2023 Baseten Aug. 02, 2023 514 -
Three techniques to adapt LLMs for any use case Philip Kiely Jun. 15, 2023 983 -
New in June 2023 Baseten Jun. 29, 2023 424 -
How we achieved SOC 2 and HIPAA compliance as an early-stage company Baseten Mar. 08, 2023 673 -
How to benchmark image generation models like Stable Diffusion XL Philip Kiely Jan. 31, 2024 1374 -
Comparing tokens per second across LLMs Philip Kiely May. 09, 2024 769 -
What I learned from my AI startup’s internal hackathon Julien Reiman Jun. 12, 2023 719 -
How latent consistency models work Rachel Rapp Jun. 04, 2024 1140 -
New in August 2023 Baseten Aug. 31, 2023 591 -
Comparing NVIDIA GPUs for AI: T4 vs A10 Philip Kiely Apr. 27, 2023 1604 -
Unlocking the full power of NVIDIA H100 GPUs for ML inference with TensorRT Pankaj Gupta, Philip Kiely Feb. 06, 2024 1623 -
Deploy Falcon-40B on Baseten Sid Shanker Jun. 09, 2023 794 -
New in February 2024 Baseten Feb. 29, 2024 634 -
How to choose the right instance size for your ML models Philip Kiely Jan. 18, 2023 597 -
How to serve 10,000 fine-tuned LLMs from a single GPU Pankaj Gupta, Philip Kiely Jul. 23, 2024 1895 -
New in September 2023 Baseten Sep. 29, 2023 605 -
Streaming real-time text to speech with XTTS V2 Het Trivedi, Philip Kiely Apr. 18, 2024 1318 -
Continuous vs dynamic batching for AI inference Matt Howard, Philip Kiely Apr. 05, 2024 1350 -
Models We Love: June 2023 Baseten Jul. 06, 2023 1498 -
High performance ML inference with NVIDIA TensorRT Justin Yi, Philip Kiely Mar. 12, 2024 1076 -
NVIDIA A10 vs A100 GPUs for LLM and Stable Diffusion inference Philip Kiely Sep. 15, 2023 1636 -
FP8: Efficient model inference with 8-bit floating point numbers Pankaj Gupta, Philip Kiely Mar. 07, 2024 1021 2
Deployment and inference for open source text embedding models Philip Kiely Nov. 02, 2023 1706 -
The best open source large language model Philip Kiely Feb. 09, 2024 1920 -
New in January 2024 Baseten Jan. 31, 2024 580 -
Deploy open-source models in a couple clicks from Baseten’s model library Emmiliese von Avis Jun. 08, 2023 888 -
Playground v2 vs Stable Diffusion XL 1.0 for text-to-image generation Philip Kiely Dec. 13, 2023 1075 -
Using fractional H100 GPUs for efficient model serving Matt Howard, Vlad Shulman, Pankaj Gupta, Philip Kiely Mar. 28, 2024 1086 -
Jina AI’s jina-embeddings-v2: an open source text embedding model that matches OpenAI’s ada-002 Philip Kiely Oct. 27, 2023 547 -
40% faster Stable Diffusion XL inference with NVIDIA TensorRT Pankaj Gupta, Justin Yi, Philip Kiely Feb. 22, 2024 2403 -
Ten reasons to join Baseten Dustin Michaels, Philip Kiely Jul. 25, 2024 1230 -
Why GPU utilization matters for model inference Marius Killinger, Philip Kiely Feb. 20, 2024 816 -
New in March 2024 Baseten Mar. 28, 2024 553 -
Build your own open-source ChatGPT with Llama 2 and Chainlit Philip Kiely Aug. 23, 2023 1061 -
SDXL inference in under 2 seconds: the ultimate guide to Stable Diffusion optimization Varun Shenoy, Philip Kiely Aug. 30, 2023 1352 -
A checklist for switching to open source ML models Philip Kiely Nov. 21, 2023 482 -
New in May 2023 Baseten Jun. 02, 2023 384 -
Baseten announces HIPAA compliance Baseten Mar. 28, 2023 167 -
Compound AI systems explained Rachel Rapp Aug. 06, 2024 1338 -
What I learned as a forward-deployed engineer working at an AI startup Het Trivedi May. 31, 2024 1353 -
Introducing Baseten Chains Bola Malek, Marius Killinger, Sid Shanker, Rachel Rapp, Mike Bilodeau Jun. 27, 2024 1132 9
The benefits of globally distributed infrastructure for model serving Phil Howes, Philip Kiely Mar. 01, 2024 603 -
Technical deep dive: Truss live reload Pankaj Gupta Feb. 17, 2023 1852 -
33% faster LLM inference with FP8 quantization Pankaj Gupta, Philip Kiely Mar. 14, 2024 1876 -
Using asynchronous inference in production Samiksha Pal, Helen Yang, Rachel Rapp Jul. 11, 2024 950 -
Introduction to quantizing ML models Abu Qader, Philip Kiely Jan. 31, 2024 1679 1
Understanding NVIDIA’s Datacenter GPU line Philip Kiely May. 23, 2023 708 -
New in April 2024 Baseten May. 01, 2024 552 -
Benchmarking fast Mistral 7B inference Abu Qader, Pankaj Gupta, Justin Yi, Philip Kiely Mar. 14, 2024 1571 -
Comparing GPUs across architectures and tiers Philip Kiely May. 22, 2023 765 -
SPC hackathon winners build with Llama 3.1 on Baseten Philip Kiely Aug. 16, 2024 615 -
Understanding performance benchmarks for LLM inference Philip Kiely Jan. 12, 2024 1459 -
New in December 2023 Baseten Dec. 27, 2023 553 -
Pinning ML model revisions for compatibility and security Philip Kiely Nov. 09, 2023 564 -
Comparing few-step image generation models Rachel Rapp Jun. 14, 2024 1087 -
Choosing the right horizontal scaling setup for high-traffic models Philip Kiely Jan. 19, 2023 628 -
Models We Love: July 2023 Baseten Jul. 26, 2023 1831 -
Faster Mixtral inference with TensorRT-LLM and quantization Pankaj Gupta, Timur Abishev, Philip Kiely Dec. 22, 2023 1467 2
NVIDIA A10 vs A10G for ML model inference Philip Kiely Nov. 28, 2023 1056 -
Stable Video Diffusion now available Sid Shanker, Varun Shenoy Nov. 22, 2023 324 -
New in October 2023 Baseten Oct. 31, 2023 497 -
Introducing automatic LLM optimization with TensorRT-LLM Engine Builder Abu Qader, Philip Kiely Aug. 01, 2024 939 2
New in March 2023 Baseten Mar. 31, 2023 359 -
Deploying custom ComfyUI workflows as APIs Het Trivedi, Rachel Rapp Jul. 25, 2024 1144 -
Deploy StableLM with Truss Tuhin Srivastava Apr. 20, 2023 423 -
Build a chatbot with Llama 2 and LangChain Philip Kiely Jul. 27, 2023 1440 -
Model autoscaling features on Baseten Jesse Mostipak Jul. 07, 2023 890 -
GPT vs Mistral: Migrate to open source LLMs seamlessly Sid Shanker, Philip Kiely Nov. 22, 2023 879 -
New in May 2024 Baseten Jun. 03, 2024 598 -
CI/CD for AI model deployments Vlad Shulman, Samiksha Pal, Sid Shanker, Philip Kiely Apr. 30, 2024 914 -
Getting started with foundation models Jesse Mostipak Jun. 06, 2023 1226 -
AI infrastructure: build vs. buy Baseten Jul. 28, 2023 1040 -
Announcing our Series B Tuhin Srivastava Mar. 04, 2024 629 2
Control plane vs workload plane in model serving infrastructure Colin McGrath, Matt Howard, Philip Kiely May. 29, 2024 870 -
If You Build It, Devs will Come: How to Host an AI Meetup Julien Reiman Apr. 06, 2023 1061 -
New in November 2023 Baseten Nov. 30, 2023 419 -
Baseten Chains explained: building multi-component AI workflows at scale Marius Killinger, Rachel Rapp Jul. 02, 2024 2424 -
New in April 2023 Baseten Apr. 30, 2023 510 -
How to double tokens per second for Llama 3 with Medusa Abu Qader, Philip Kiely Aug. 20, 2024 1462 2
The best open-source image generation model Philip Kiely Aug. 29, 2024 1409 -
How to build function calling and JSON mode for open-source and fine-tuned LLMs Bryce Dubayah, Philip Kiely Sep. 12, 2024 1339 1
Introducing function calling and structured output for open-source and fine-tuned LLMs Bryce Dubayah, Philip Kiely Sep. 12, 2024 604 -
Building high-performance compound AI applications with MongoDB Atlas and Baseten Philip Kiely Sep. 17, 2024 1425 -
Introducing Baseten Hybrid: control and flexibility in your cloud and ours Mike Bilodeau, Rachel Rapp Sep. 26, 2024 633 -
Baseten partners with Google Cloud to deliver high-performance AI infrastructure to a broader audience Mike Bilodeau, Rachel Rapp Sep. 26, 2024 688 -
Export your model inference metrics to your favorite observability tool Helen Yang, Nicolas Gere-lamaysouette, Philip Kiely Oct. 05, 2024 493 -
Evaluating NVIDIA H200 GPUs for LLM inference Pankaj Gupta, Philip Kiely Oct. 23, 2024 1294 -
Introducing canary deployments on Baseten Sid Shanker, Jonathan Rochette, Raymond Cano, Rachel Rapp Nov. 01, 2024 932 -
Create custom environments for deployments on Baseten Samiksha Pal, Raymond Cano, Sid Shanker, Rachel Rapp Nov. 15, 2024 621 -

By Matt Makai. 2021-2024.