Semantic caching is a technique used to optimize systems that rely on large language models (LLMs) by using vector embeddings to store pre-calculated responses for similar queries. However, developers face challenges in implementing semantic caching effectively, including setting the right distance threshold and using effective embedding models to ensure accuracy. To address these challenges, researchers have developed evaluation datasets and methods to assess model performance, such as precision, recall, F1 score, and average latency. The study found that the sentence-transformers all-mpnet-base-v2 embedding model performed well in optimizing precision, recall, memory, latency, and F1 score for semantic caching applications. However, there remains room for improvement in separating true duplicates from semantically similar but non-duplicate queries, and future research aims to explore advanced techniques such as training custom embedding models and incorporating query rewriting processes.