Built with Claude, for Claude.
Stop paying
full price
twice.
Automatic prompt caching for Claude Code. Detects stable content — system prompts, tool definitions, file reads — and marks them so you pay 90% less on every repeated turn.
# Two commands. That's it.
/plugin marketplace add https://github.com/flightlesstux/prompt-caching
/plugin install prompt-caching@ercan-ermis
# No npm. No config file. No restart.
# Savings start on the next turn.
cost reduction
How prompt-caching works
Anthropic's caching API stores stable content server-side for 5 minutes. Cache reads cost 0.1× instead of 1×. This plugin places the breakpoints automatically.
BugFix Mode
Detects stack traces in your messages. Caches the buggy file + error context once. Every follow-up only pays for the new question.
Refactor Mode
Detects refactor keywords + file lists. Caches the before-pattern, style guides, and type definitions. Only per-file instructions re-sent.
File Tracking
Tracks read counts per file. On the second read, injects a cache breakpoint. All future reads cost 0.1× instead of 1×.
Conversation Freeze
After N turns, freezes all messages before turn (N−3) as a cached prefix. Only the last 3 turns sent fresh. Savings compound.
Benchmarks
Measured on real Claude Code sessions with Sonnet. Break-even at turn 2.
| Session type | Turns | Without caching | With caching | Savings |
|---|---|---|---|---|
| Bug fix (single file) | 20 | 184,000 tokens | 28,400 tokens | 85% |
| Refactor (5 files) | 15 | 310,000 tokens | 61,200 tokens | 80% |
| General coding | 40 | 890,000 tokens | 71,200 tokens | 92% |
| Repeated file reads (5×5) | — | 50,000 tokens | 5,100 tokens | 90% |
Install prompt-caching
One command for Claude Code. For Cursor, Windsurf, ChatGPT, Perplexity, Zed, Continue.dev, and any other MCP-compatible client — use the npm install path below.
⏳ Pending approval in the official Claude Code plugin marketplace. Install directly from GitHub in the meantime:
Run these inside Claude Code — no npm, no config file, no restart needed:
/plugin marketplace add https://github.com/flightlesstux/prompt-caching
/plugin install prompt-caching@ercan-ermis
Claude Code's plugin system handles everything automatically. The get_cache_stats tool is available immediately after install.
Install globally via npm
npm install -g prompt-caching-mcp
Add to your client's MCP config
Cursor → .cursor/mcp.json · Windsurf → MCP settings · Any MCP client → stdio
{
"mcpServers": {
"prompt-caching-mcp": {
"command": "prompt-caching-mcp"
}
}
}
FAQ
Common questions — especially the skeptical ones.
"Claude Code already does prompt caching automatically — why does this exist?"
You're right, and it's a fair question. Claude Code does handle prompt caching automatically for its own API calls — system prompts, tool definitions, and conversation history are cached out of the box. You don't need this plugin for that.
This plugin is for a different layer: when you build your own apps or agents with the Anthropic SDK. Raw SDK calls don't get automatic caching unless you place cache_control breakpoints yourself. This plugin does that automatically, plus gives you visibility into what's being cached, hit rates, and real savings — which Claude Code doesn't expose.
Short version: Claude Code using the API → already cached ✅ Your app calling the API → this plugin handles it ✅
"Hasn't Anthropic's new auto-caching feature solved this?"
Largely, yes — Anthropic's automatic caching (passing "cache_control": {"type": "ephemeral"} at the top level) handles breakpoint placement automatically now. This plugin predates that feature and originally filled that gap.
What it still adds: observability. get_cache_stats tracks hit rate and cumulative savings across turns. analyze_cacheability lets you dry-run a prompt to see exactly what would be cached and where breakpoints land — useful when debugging why you're getting cache misses. If you just want set-and-forget, Anthropic's auto-caching is enough.
"I checked ccusage and my cache reads are already huge. Does this help me?"
If you're seeing high cache reads in Claude Code, that's the built-in caching working perfectly — this plugin won't improve those numbers further. It can't intercept Claude Code's own API calls.
It would help you if you're also running your own scripts or agents via the Anthropic SDK alongside Claude Code. Those calls are separate and don't benefit from Claude Code's automatic caching.
"Which models support prompt caching?"
Claude Opus 4.6/4.5/4.1/4, Sonnet 4.6/4.5/4/3.7, Haiku 4.5, Haiku 3.5, and Haiku 3. Minimum cacheable prompt size varies by model (1024–4096 tokens). See the official pricing table for full details.
"What's the cache lifetime?"
5 minutes by default (ephemeral), extended for free on every cache hit. A 1-hour TTL is also available at 2× the base input token price — useful for long-running agents or infrequent sessions. See Anthropic's prompt caching docs for the full breakdown.
Open source · MIT · Zero lock-in