MCP Plugin · Open Source · MIT

Built with Claude, for Claude.

Stop paying
full price
twice.

Automatic prompt caching for Claude Code. Detects stable content — system prompts, tool definitions, file reads — and marks them so you pay 90% less on every repeated turn.

Get started View on GitHub →

Claude Code

# Two commands. That's it.

/plugin marketplace add https://github.com/flightlesstux/prompt-caching
/plugin install prompt-caching@ercan-ermis

# No npm. No config file. No restart.
# Savings start on the next turn.

90% average token
cost reduction

90% token cost reduction

4 session modes

0 config required

MIT open source

How prompt-caching works

Anthropic's caching API stores stable content server-side for 5 minutes. Cache reads cost 0.1× instead of 1×. This plugin places the breakpoints automatically.

🐛

BugFix Mode

Detects stack traces in your messages. Caches the buggy file + error context once. Every follow-up only pays for the new question.

stack trace detected

♻️

Refactor Mode

Detects refactor keywords + file lists. Caches the before-pattern, style guides, and type definitions. Only per-file instructions re-sent.

refactor keywords + files

📂

File Tracking

Tracks read counts per file. On the second read, injects a cache breakpoint. All future reads cost 0.1× instead of 1×.

always on — all modes

🧊

Conversation Freeze

After N turns, freezes all messages before turn (N−3) as a cached prefix. Only the last 3 turns sent fresh. Savings compound.

turn count threshold

Benchmarks

Measured on real Claude Code sessions with Sonnet. Break-even at turn 2.

Session type	Turns	Without caching	With caching	Savings
Bug fix (single file)	20	184,000 tokens	28,400 tokens	85%
Refactor (5 files)	15	310,000 tokens	61,200 tokens	80%
General coding	40	890,000 tokens	71,200 tokens	92%
Repeated file reads (5×5)	—	50,000 tokens	5,100 tokens	90%

Cache creation costs 1.25× normal. Cache reads cost 0.1×. Every turn after the first is pure savings.

Install prompt-caching

One command for Claude Code. For Cursor, Windsurf, ChatGPT, Perplexity, Zed, Continue.dev, and any other MCP-compatible client — use the npm install path below.

Claude Code recommended

⏳ Pending approval in the official Claude Code plugin marketplace. Install directly from GitHub in the meantime:

Run these inside Claude Code — no npm, no config file, no restart needed:

/plugin marketplace add https://github.com/flightlesstux/prompt-caching
/plugin install prompt-caching@ercan-ermis

Claude Code's plugin system handles everything automatically. The get_cache_stats tool is available immediately after install.

Cursor · Windsurf · ChatGPT · Perplexity · Zed · Continue.dev · any MCP client

Install globally via npm

npm install -g prompt-caching-mcp

Add to your client's MCP config

Cursor → .cursor/mcp.json · Windsurf → MCP settings · Any MCP client → stdio

{
  "mcpServers": {
    "prompt-caching-mcp": {
      "command": "prompt-caching-mcp"
    }
  }
}

FAQ

Common questions — especially the skeptical ones.

"Claude Code already does prompt caching automatically — why does this exist?"

You're right, and it's a fair question. Claude Code does handle prompt caching automatically for its own API calls — system prompts, tool definitions, and conversation history are cached out of the box. You don't need this plugin for that.

This plugin is for a different layer: when you build your own apps or agents with the Anthropic SDK. Raw SDK calls don't get automatic caching unless you place cache_control breakpoints yourself. This plugin does that automatically, plus gives you visibility into what's being cached, hit rates, and real savings — which Claude Code doesn't expose.

Short version: Claude Code using the API → already cached ✅ Your app calling the API → this plugin handles it ✅

"Hasn't Anthropic's new auto-caching feature solved this?"

Largely, yes — Anthropic's automatic caching (passing "cache_control": {"type": "ephemeral"} at the top level) handles breakpoint placement automatically now. This plugin predates that feature and originally filled that gap.

What it still adds: observability. get_cache_stats tracks hit rate and cumulative savings across turns. analyze_cacheability lets you dry-run a prompt to see exactly what would be cached and where breakpoints land — useful when debugging why you're getting cache misses. If you just want set-and-forget, Anthropic's auto-caching is enough.

"I checked ccusage and my cache reads are already huge. Does this help me?"

If you're seeing high cache reads in Claude Code, that's the built-in caching working perfectly — this plugin won't improve those numbers further. It can't intercept Claude Code's own API calls.

It would help you if you're also running your own scripts or agents via the Anthropic SDK alongside Claude Code. Those calls are separate and don't benefit from Claude Code's automatic caching.

"Which models support prompt caching?"

Claude Opus 4.6/4.5/4.1/4, Sonnet 4.6/4.5/4/3.7, Haiku 4.5, Haiku 3.5, and Haiku 3. Minimum cacheable prompt size varies by model (1024–4096 tokens). See the official pricing table for full details.

"What's the cache lifetime?"

5 minutes by default (ephemeral), extended for free on every cache hit. A 1-hour TTL is also available at 2× the base input token price — useful for long-running agents or infrequent sessions. See Anthropic's prompt caching docs for the full breakdown.

Open source · MIT · Zero lock-in

Ready to cut your Claude Code
token costs by 90%?

View on GitHub Contribute →