We are excited to announce that Anyscale's LLM API Offering, including Private Endpoints for self-hosted LLMs, is now available as part of the Anyscale Platform. This marks a significant milestone in our journey towards democratizing access to large language models. With our new endpoint offering, developers can easily integrate open-source embedding models into their applications, such as retrieval-augmented generation (RAG) applications, at an affordable price point of $0.05/MTokens for the gte-large model. We also plan to add more models in the future and invite users to request newer embedding models through a Google form. Additionally, we have extended fine-tuning functionality to our Llama-2 70B model, allowing developers to improve model quality while reducing costs and improving performance. The fine-tuned model can now be used for inference at $1/M tokens. Furthermore, users can now get started with Anyscale Endpoints without a credit card, receiving free credits that can be added to their account later.