Introducing Baseten Embeddings Inference: The fastest embeddings solution available

Company

Baseten

Date Published

March 28, 2025

Author

Michael Feil, Rachel Rapp

Word count

782

Language

English

Hacker News points

None

URL

www.baseten.co/blog/introducing-baseten-embeddings-inference-bei

Summary

Baseten Embeddings Inference (BEI) is the fastest embeddings solution available for high-throughput and low-latency production workloads. It offers over 2x higher throughput and 10% lower latency compared to previous industry standards, making it suitable for rapid responses in applications such as search and retrieval, agents, and recommender systems. BEI is designed to provide optimized inference performance out of the box for embedding, reranker, and classification models, with a focus on low memory footprint and scalability. It can be used with open-source, custom, or fine-tuned models, and works well in compound AI systems, making it an ideal solution for companies building products that leverage embeddings in production.