Prompt caching with Claude
The Anthropic API now offers prompt caching, a feature that enables developers to cache frequently used context between API calls, reducing costs by up to 90% and latency by up to 85%. This feature is available in public beta for Claude 3.5 Sonnet, Claude 3 Haiku, with support for Claude 3 Opus coming soon. Prompt caching can be effective in situations such as conversational agents, coding assistants, large document processing, detailed instruction sets, agentic search and tool use, and talking to long-form content like books and papers. Early customers have seen substantial speed and cost improvements with prompt caching, including a full knowledge base, 100-shot examples, and multi-turn conversations. The pricing for cached prompts is based on the number of input tokens cached and how frequently they are used, with writing to the cache costing 25% more than the base input token price, while using cached content costs only 10%.
Company
Anthropic
Date published
Aug. 14, 2024
Author(s)
-
Word count
619
Language
English
Hacker News points
13