Caching LLM Queries for performance & cost improvements

Company

Zilliz

Date Published

April 10, 2023

Author

Chris Churilo

Word count

1079

Language

English

Hacker News points

None

URL

zilliz.com/blog/Caching-LLM-Queries-for-performance-improvements

Summary

GPTCache is an open-source semantic cache designed to improve the efficiency and speed of GPT-based applications by storing responses generated by language models. It allows users to customize the cache according to their needs, including options for embedding functions, similarity evaluation functions, storage location, and eviction policy management. The tool supports multiple popular databases for cache storage and provides a range of vector store options for finding the most similar requests based on extracted embeddings from input requests. GPTCache aims to provide flexibility and cater to a wider range of use cases by supporting multiple APIs and vector stores.