Company
Date Published
May 1, 2024
Author
Baseten
Word count
552
Language
English
Hacker News points
None

Summary

In April 2024, Baseten released several best-in-class large language models (LLMs) in different sizes, ranging from 3.8 billion to 141 billion parameters, offering flexibility for trade-offs between cost and output quality. These models can be deployed with optimized techniques such as TensorRT-LLM implementations, FP8 quantization, and continuous batching for improved inference efficiency. Additionally, Baseten introduced streaming endpoints for its LLMs, enabling real-time text-to-speech synthesis capabilities. The company also emphasized the importance of CI/CD pipelines for AI models, providing a model management API to build customized tooling for deployment, and introducing a new feature that allows for more reliable deployment status tracking. These updates expand the possibilities for building with AI and improve the stability of production deployments.