Prompt caching with Claude

Company

Anthropic

Date Published

Aug. 14, 2024

Author

Word count

619

Language

English

Hacker News points

URL

www.anthropic.com/news/prompt-caching

Summary

The Anthropic API now offers prompt caching, a feature that enables developers to cache frequently used context between API calls, reducing costs by up to 90% and latency by up to 85%. This feature is available in public beta for Claude 3.5 Sonnet, Claude 3 Haiku, with support for Claude 3 Opus coming soon. Prompt caching can be effective in situations such as conversational agents, coding assistants, large document processing, detailed instruction sets, agentic search and tool use, and talking to long-form content like books and papers. Early customers have seen substantial speed and cost improvements with prompt caching, including a full knowledge base, 100-shot examples, and multi-turn conversations. The pricing for cached prompts is based on the number of input tokens cached and how frequently they are used, with writing to the cache costing 25% more than the base input token price, while using cached content costs only 10%.