New in March 2024

Company

Baseten

Date Published

March 28, 2024

Author

Baseten

Word count

553

Language

English

Hacker News points

None

URL

www.baseten.co/blog/new-in-march-2024

Summary

The Baseten platform has made significant improvements in model performance and developer experience, including optimized models for FP8 quantization, Multi-Instance GPUs, and TensorRT-LLM. The company has also introduced a new REST API endpoint for automating key model management tasks, allowing users to manage models and workspace properties with ease. Additionally, Baseten has released a benchmarking guide for Mistral 7B and is actively researching new techniques for faster inference, offering substantial cost savings on high-performance model deployments.