Baseten

Founded in 2019. Privately Held.

External links: homepage | docs | blog | jobs | youtube | twitter | github | linkedin

Inference platform for AI models.

Blog content published by word count

Switch to post count

Blog content

post title author published words HN
New in October: Find community with The DSC Baseten Oct. 31, 2022 408 -
New in May 2022: Off-site but on-track Baseten May. 26, 2022 432 -
Introducing Baseten Self-hosted Anupreet Walia, Rachel Rapp Aug. 08, 2024 670 -
Four ML models that accelerate content creation Philip Kiely Jun. 02, 2022 945 -
New in December 2021 Emmiliese von Avis Jan. 07, 2022 494 -
Deploying and using Stable Diffusion XL 1.0 Philip Kiely Jul. 26, 2023 286 -
How to serve your ComfyUI model behind an API endpoint Het Trivedi, Philip Kiely Dec. 08, 2023 1326 -
New in July: A seamless bridge from model development to deployment Baseten Jul. 29, 2022 414 -
Baseten achieves SOC 2 Type II certification Baseten Mar. 08, 2023 282 -
New in January 2023 Baseten Jan. 31, 2023 538 -
AudioGen: deploy and build today! Jesse Mostipak Aug. 04, 2023 340 -
Open source alternatives for machine learning models Varun Shenoy, Philip Kiely Nov. 21, 2023 1207 -
A guide to LLM inference and performance Varun Shenoy, Philip Kiely Nov. 17, 2023 3038 113
New in July 2023 Baseten Aug. 02, 2023 514 -
Three techniques to adapt LLMs for any use case Philip Kiely Jun. 15, 2023 983 -
StartupML AMA: Nikhil Harithas Derek Kim Aug. 09, 2022 1774 -
New in June 2023 Baseten Jun. 29, 2023 424 -
Build with OpenAI’s Whisper model in five minutes Justin Yi Oct. 18, 2022 712 -
Go from machine learning models to full-stack applications Tuhin Srivastava May. 03, 2022 1026 -
How we achieved SOC 2 and HIPAA compliance as an early-stage company Baseten Mar. 08, 2023 673 -
How to benchmark image generation models like Stable Diffusion XL Philip Kiely Jan. 31, 2024 1374 -
Comparing tokens per second across LLMs Philip Kiely May. 09, 2024 769 -
What I learned from my AI startup’s internal hackathon Julien Reiman Jun. 12, 2023 719 -
New in August: Deploy, deploy, deploy Baseten Aug. 31, 2022 430 -
How latent consistency models work Rachel Rapp Jun. 04, 2024 1140 -
New in August 2023 Baseten Aug. 31, 2023 591 -
Comparing NVIDIA GPUs for AI: T4 vs A10 Philip Kiely Apr. 27, 2023 1604 -
Unlocking the full power of NVIDIA H100 GPUs for ML inference with TensorRT Pankaj Gupta, Philip Kiely Feb. 06, 2024 1623 -
Deploy Falcon-40B on Baseten Sid Shanker Jun. 09, 2023 794 -
New in February 2024 Baseten Feb. 29, 2024 634 -
StartupML AMA: Daniel Whitenack Derek Kim Aug. 30, 2022 1706 -
How to choose the right instance size for your ML models Philip Kiely Jan. 18, 2023 597 -
How to serve 10,000 fine-tuned LLMs from a single GPU Pankaj Gupta, Philip Kiely Jul. 23, 2024 1895 -
New in September 2023 Baseten Sep. 29, 2023 605 -
Streaming real-time text to speech with XTTS V2 Het Trivedi, Philip Kiely Apr. 18, 2024 1318 -
Continuous vs dynamic batching for AI inference Matt Howard, Philip Kiely Apr. 05, 2024 1350 -
Models We Love: June 2023 Baseten Jul. 06, 2023 1498 -
High performance ML inference with NVIDIA TensorRT Justin Yi, Philip Kiely Mar. 12, 2024 1076 -
Why we built and open-sourced a model serving solution Phil Howes Aug. 05, 2022 1030 -
NVIDIA A10 vs A100 GPUs for LLM and Stable Diffusion inference Philip Kiely Sep. 15, 2023 1636 -
New in September: Increasing flexibility and robustness Baseten Sep. 29, 2022 461 -
Baseten achieves SOC 2 Type 1 certification Baseten Mar. 16, 2022 280 -
FP8: Efficient model inference with 8-bit floating point numbers Pankaj Gupta, Philip Kiely Mar. 07, 2024 1021 2
Deployment and inference for open source text embedding models Philip Kiely Nov. 02, 2023 1706 -
The best open source large language model Philip Kiely Feb. 09, 2024 1920 -
New in January 2024 Baseten Jan. 31, 2024 580 -
How to deploy Stable Diffusion using Truss Abu Qader Sep. 01, 2022 1038 -
Deploy open-source models in a couple clicks from Baseten’s model library Emmiliese von Avis Jun. 08, 2023 888 -
Playground v2 vs Stable Diffusion XL 1.0 for text-to-image generation Philip Kiely Dec. 13, 2023 1075 -
Using fractional H100 GPUs for efficient model serving Matt Howard, Vlad Shulman, Pankaj Gupta, Philip Kiely Mar. 28, 2024 1086 -
New in November 2021 Emmiliese von Avis Nov. 22, 2021 372 -
Jina AI’s jina-embeddings-v2: an open source text embedding model that matches OpenAI’s ada-002 Philip Kiely Oct. 27, 2023 547 -
Accelerating model deployment: 100X faster dev loops with development deployments Baseten Dec. 08, 2022 810 -
40% faster Stable Diffusion XL inference with NVIDIA TensorRT Pankaj Gupta, Justin Yi, Philip Kiely Feb. 22, 2024 2403 -
New in June: Full-stack superpowers Baseten Jun. 30, 2022 463 -
Ten reasons to join Baseten Dustin Michaels, Philip Kiely Jul. 25, 2024 1230 -
Why GPU utilization matters for model inference Marius Killinger, Philip Kiely Feb. 20, 2024 816 -
New in March 2024 Baseten Mar. 28, 2024 553 -
Build your own open-source ChatGPT with Llama 2 and Chainlit Philip Kiely Aug. 23, 2023 1061 -
Designing parental leave at an early stage startup Paige Pauli Feb. 02, 2022 844 -
SDXL inference in under 2 seconds: the ultimate guide to Stable Diffusion optimization Varun Shenoy, Philip Kiely Aug. 30, 2023 1352 -
A checklist for switching to open source ML models Philip Kiely Nov. 21, 2023 482 -
New in May 2023 Baseten Jun. 02, 2023 384 -
Baseten announces HIPAA compliance Baseten Mar. 28, 2023 167 -
Compound AI systems explained Rachel Rapp Aug. 06, 2024 1338 -
What I learned as a forward-deployed engineer working at an AI startup Het Trivedi May. 31, 2024 1353 -
Introducing Baseten Chains Bola Malek, Marius Killinger, Sid Shanker, Rachel Rapp, Mike Bilodeau Jun. 27, 2024 1132 9
Introducing Baseten Tuhin Srivastava May. 20, 2021 1088 -
The benefits of globally distributed infrastructure for model serving Phil Howes, Philip Kiely Mar. 01, 2024 603 -
Technical deep dive: Truss live reload Pankaj Gupta Feb. 17, 2023 1852 -
33% faster LLM inference with FP8 quantization Pankaj Gupta, Philip Kiely Mar. 14, 2024 1876 -
Using asynchronous inference in production Samiksha Pal, Helen Yang, Rachel Rapp Jul. 11, 2024 950 -
Introduction to quantizing ML models Abu Qader, Philip Kiely Jan. 31, 2024 1679 1
Understanding NVIDIA’s Datacenter GPU line Philip Kiely May. 23, 2023 708 -
New in April 2024 Baseten May. 01, 2024 552 -
Benchmarking fast Mistral 7B inference Abu Qader, Pankaj Gupta, Justin Yi, Philip Kiely Mar. 14, 2024 1571 -
Comparing GPUs across architectures and tiers Philip Kiely May. 22, 2023 765 -
SPC hackathon winners build with Llama 3.1 on Baseten Philip Kiely Aug. 16, 2024 615 -
Understanding performance benchmarks for LLM inference Philip Kiely Jan. 12, 2024 1459 -
New in December 2023 Baseten Dec. 27, 2023 553 -
Pinning ML model revisions for compatibility and security Philip Kiely Nov. 09, 2023 564 -
Comparing few-step image generation models Rachel Rapp Jun. 14, 2024 1087 -
Choosing the right horizontal scaling setup for high-traffic models Philip Kiely Jan. 19, 2023 628 -
Models We Love: July 2023 Baseten Jul. 26, 2023 1831 -
Faster Mixtral inference with TensorRT-LLM and quantization Pankaj Gupta, Timur Abishev, Philip Kiely Dec. 22, 2023 1467 2
NVIDIA A10 vs A10G for ML model inference Philip Kiely Nov. 28, 2023 1056 -
Stable Video Diffusion now available Sid Shanker, Varun Shenoy Nov. 22, 2023 324 -
Serving four million Riffusion requests in two days Phil Howes Dec. 21, 2022 757 -
Announcing our Series A Tuhin Srivastava Apr. 26, 2022 727 -
Create an API endpoint for an ML model Philip Kiely Apr. 22, 2022 339 -
New in October 2023 Baseten Oct. 31, 2023 497 -
Introducing automatic LLM optimization with TensorRT-LLM Engine Builder Abu Qader, Philip Kiely Aug. 01, 2024 939 2
New in March 2023 Baseten Mar. 31, 2023 359 -
Deploying custom ComfyUI workflows as APIs Het Trivedi, Rachel Rapp Jul. 25, 2024 1144 -
Deploy StableLM with Truss Tuhin Srivastava Apr. 20, 2023 423 -
Build a chatbot with Llama 2 and LangChain Philip Kiely Jul. 27, 2023 1440 -
Model autoscaling features on Baseten Jesse Mostipak Jul. 07, 2023 890 -
Part 1: Working at an early stage company as an early stage engineer Samiksha Pal Nov. 29, 2021 1538 -
GPT vs Mistral: Migrate to open source LLMs seamlessly Sid Shanker, Philip Kiely Nov. 22, 2023 879 -
New in May 2024 Baseten Jun. 03, 2024 598 -
CI/CD for AI model deployments Vlad Shulman, Samiksha Pal, Sid Shanker, Philip Kiely Apr. 30, 2024 914 -
Getting started with foundation models Jesse Mostipak Jun. 06, 2023 1226 -
How Baseten is using "docs as code" to build best-in-class documentation Philip Kiely Mar. 09, 2022 1014 -
AI infrastructure: build vs. buy Baseten Jul. 28, 2023 1040 -
Announcing our Series B Tuhin Srivastava Mar. 04, 2024 629 2
New in December 2022 Baseten Dec. 23, 2022 554 -
Control plane vs workload plane in model serving infrastructure Colin McGrath, Matt Howard, Philip Kiely May. 29, 2024 870 -
If You Build It, Devs will Come: How to Host an AI Meetup Julien Reiman Apr. 06, 2023 1061 -
New in November 2023 Baseten Nov. 30, 2023 419 -
Baseten Chains explained: building multi-component AI workflows at scale Marius Killinger, Rachel Rapp Jul. 02, 2024 2424 -
New in April 2023 Baseten Apr. 30, 2023 510 -
How to double tokens per second for Llama 3 with Medusa Abu Qader, Philip Kiely Aug. 20, 2024 1462 2
The best open-source image generation model Philip Kiely Aug. 29, 2024 1409 -
How to build function calling and JSON mode for open-source and fine-tuned LLMs Bryce Dubayah, Philip Kiely Sep. 12, 2024 1339 1
Introducing function calling and structured output for open-source and fine-tuned LLMs Bryce Dubayah, Philip Kiely Sep. 12, 2024 604 -
Building high-performance compound AI applications with MongoDB Atlas and Baseten Philip Kiely Sep. 17, 2024 1425 -
Introducing Baseten Hybrid: control and flexibility in your cloud and ours Mike Bilodeau, Rachel Rapp Sep. 26, 2024 633 -
Baseten partners with Google Cloud to deliver high-performance AI infrastructure to a broader audience Mike Bilodeau, Rachel Rapp Sep. 26, 2024 688 -
Export your model inference metrics to your favorite observability tool Helen Yang, Nicolas Gere-lamaysouette, Philip Kiely Oct. 05, 2024 493 -
Evaluating NVIDIA H200 GPUs for LLM inference Pankaj Gupta, Philip Kiely Oct. 23, 2024 1294 -
Introducing canary deployments on Baseten Sid Shanker, Jonathan Rochette, Raymond Cano, Rachel Rapp Nov. 01, 2024 932 -
Create custom environments for deployments on Baseten Samiksha Pal, Raymond Cano, Sid Shanker, Rachel Rapp Nov. 15, 2024 621 -

By Matt Makai. 2021-2024.