New in April 2024

Company

Baseten

Date Published

May 1, 2024

Author

Baseten

Word count

552

Language

English

Hacker News points

None

URL

www.baseten.co/blog/new-in-april-2024

Summary

In April 2024, Baseten released several best-in-class large language models (LLMs) in different sizes, ranging from 3.8 billion to 141 billion parameters, offering flexibility for trade-offs between cost and output quality. These models can be deployed with optimized techniques such as TensorRT-LLM implementations, FP8 quantization, and continuous batching for improved inference efficiency. Additionally, Baseten introduced streaming endpoints for its LLMs, enabling real-time text-to-speech synthesis capabilities. The company also emphasized the importance of CI/CD pipelines for AI models, providing a model management API to build customized tooling for deployment, and introducing a new feature that allows for more reliable deployment status tracking. These updates expand the possibilities for building with AI and improve the stability of production deployments.