We Taught Our AI Agents to Take Coffee Breaks (And It Saved Us Real Money)
Here's something nobody tells you about Claude Code: when you step away from
your keyboard, your session is quietly getting more expensive.
The prompt cache — the mechanism that lets the API recognize "you've already
sent me this context, don't charge full price again" — has a time-to-live.
On a good day, it's 1 hour. On a bad day (more on that in our
cache investigation series), it's 5
minutes. Either way, if you go to lunch and come back, your entire context
rebuilds from scratch on the next turn.
We built a slash command to fix that. Then we accidentally discovered it's
also a fleet management tool.
The Cold Start Tax
Every Claude Code session carries context — system prompt, tool definitions,
conversation history, tool results. A typical working session sits around
120,000 tokens. A large one can hit 600K or more.
When the cache is warm, sending that context on each turn is cheap. When it
expires, the API has to re-cache everything, and you pay the write premium:
| Operation | Opus 4.6 Cost | Relative |
|---|---|---|
| Cache read (warm) | $0.50/MTok | 1x |
| Cache write (5m TTL) | $6.25/MTok | 12.5x |
| Cache write (1h TTL) | $10.00/MTok | 20x |
For a 120k token session, one cold start costs $0.75 to $1.20 depending on
your TTL tier. A cache read — keeping it warm — costs $0.06.
That's a 92-95% difference. For doing nothing except reminding the API that
your session still exists.
/coffee
We built a Claude Code skill called /coffee. You tell it how long you'll be
gone, and it keeps your cache warm while you're away.
> /coffee 30
Coffee break: 30 minutes
Cache TTL tier: 1 hour (Q5h at 56%)
Est. context: ~120k tokens
Ping interval: every 50min (1 ping total)
Cost estimate:
Keepalive pings: ~$0.06
Cold start: ~$1.20
Savings: ~$1.14 (95%)
Keepalive active! 1 ping scheduled over 30min.
Cache will stay warm until auto-cleanup at 2:30 PM.
It reads your current quota state from ~/.claude/quota-status.json (written
by our cache-fix interceptor), figures out whether
you're on the 1-hour or 5-minute TTL tier, and sets the ping interval
accordingly:
- 1h TTL: ping every 50 minutes (comfortable margin under the 60-minute
expiry) - 5m TTL: ping every 4 minutes (tight, but cache writes at this tier are
cheaper)
Each ping is minimal — a "say ok" prompt that triggers a cache read and
nothing else. About 10 output tokens, roughly $0.001. The cache stays warm.
You come back to a session that picks up where you left off, no rebuild tax.
The metaphor writes itself: /coffee keeps the pot warm.
The Break Menu
We leaned into the coffee metaphor because, honestly, it's just fun:
/coffee 15— espresso shot. Quick standup, check Slack, come back./coffee 30— regular coffee. The default break./coffee 1h— long meeting. Go present that quarterly review./coffee 2h— afternoon off. The cache will be here when you get back.
When the timer expires, the pings stop automatically. No cleanup, no dangling
cron jobs. The command literally keeps things warm and turns the burner off
when you're done.
Then We Discovered Fleet Management
This is the part we didn't plan.
We were running 5 concurrent Claude Code agents — our web manager, a research
agent, the blog agent writing this series, and two development agents on
kanfei-nowcast. We watched our 5-hour quota climb from 56% to 79% in 22
minutes. About 1% per minute.
At that rate, we'd hit the 100% overage boundary in roughly 25 minutes. And
as we documented in Part 4 of our cache investigation,
crossing that boundary triggers a silent TTL downgrade from 1 hour to 5
minutes — a feedback loop that makes everything more expensive.
The solution was obvious once we thought of it: put the idle agents on
coffee break.
We sent /coffee 180 to every agent that wasn't actively needed. The burn
rate dropped immediately:
Time (UTC) Q5h% Note
16:03 56% Before coffee breaks
16:25 77% 5 agents active (+21% in 22min)
16:30 79% Only web agent active, rest on coffee (+2% in 5min)
16:40 81% Slow climb
16:55 85% Holding steady before window reset
17:00 0% Fresh 5h window
17:10 1% Keepalive pings only
17:40 10% Gentle coast, all agents sustainable
From ~1%/minute with everyone active to ~0.4%/minute with one agent working
and four on coffee breaks. We coasted into the window reset at 85% instead of
blowing past 100% and triggering the TTL downgrade.
| Scenario | Burn Rate | Time to Overage from 75% |
|---|---|---|
| 5 agents active | ~1%/min | ~25 minutes |
| 1 active + 4 on coffee | ~0.4%/min | Safely rode to reset |
| All on coffee (post-reset) | ~1% per 3 min | Sustainable indefinitely |
"Put the agents on coffee break" became our standard operating procedure for
quota management. It's not sophisticated. It's barely even engineering. But
it works, and it costs almost nothing.
How It Works Under the Hood
/coffee is implemented as a Claude Code skill — a slash command you can
grab from the GitHub repo
and drop into ~/.claude/skills/coffee/. When invoked:
- Reads quota state from
~/.claude/quota-status.json(if available) to
determine the current TTL tier. If you don't have our cache-fix interceptor
installed,/coffeeassumes 5-minute TTL and pings accordingly — it works
standalone, you just won't get the TTL-aware interval optimization - Calculates the economics — ping cost vs cold start cost, number of
pings needed for the requested duration - Schedules recurring pings via
CronCreate— minimal prompts at the
right interval for the detected TTL tier - Sets a one-shot cleanup timer that cancels the pings when the break
is over
The pings themselves are deliberately boring: "Respond with exactly: ok. Do
not use tools." About 10 output tokens each. The goal is to touch the cache
without doing any actual work — the minimum viable API call.
Everything runs as session-only cron jobs. Nothing persists to disk. When
Claude Code exits, the jobs disappear. When the break timer fires, it cleans
up after itself.
The Math That Makes It Worth It
The cost comparison is lopsided enough that /coffee pays for itself on the
first avoided cold start:
You step away for 2 hours, 120k context:
If the session is truly idle while you're gone, there's just one cold start
when you come back — regardless of TTL tier. The cache expires once; you pay
once.
| 1h TTL | 5m TTL | |
|---|---|---|
| Without /coffee | 1 cold start ($1.20) | 1 cold start ($0.75) |
| With /coffee | 1 ping ($0.06) | 24 pings ($1.44) |
| Savings | $1.14 (95%) | -$0.69 (not worth it) |
On 1h TTL, the math is clear: one ping during a 2-hour break saves you $1.14.
Over a week of daily lunch breaks, that's $5-8 in avoided cold starts.
On 5m TTL for a truly idle session, /coffee actually costs more than just
eating the one cold start when you return. The keepalive is only worth it on
5m TTL when you'd be coming back to multiple cold starts — frequent returns
to the session, or the fleet scenario below.
Where 5m TTL keepalive shines: active agents.
If your agents are doing background work while you're away — running tests,
processing queues, monitoring systems — each one hits a cold start every time
the 5-minute cache expires between tasks. An agent making a call every 10
minutes on 5m TTL pays a cold start on every other call. Over 2 hours, that's
12 cold starts ($9.00) vs 24 pings ($1.44). That's the fleet case where
/coffee on 5m TTL pays for itself many times over.
What We Actually Learned
The /coffee command started as a cost optimization hack. It turned into
something more interesting: a simple, human-readable interface for managing
AI agent fleet economics.
"Put the agents on coffee break" is a sentence anyone on the team can
understand. It doesn't require knowing about TTL tiers, cache prefix matching,
or quota utilization boundaries. It just means "these agents should idle
cheaply until we need them again."
That's the kind of abstraction we think AI tooling needs more of. The
underlying economics of prompt caching, TTL management, and quota throttling
are genuinely complex. But the user-facing controls don't have to be. Sometimes
the right interface is a coffee metaphor and a timer.
This post is a companion to our six-part Claude Code Cache Investigation series. The /coffee skill depends on the quota monitoring built into our cache-fix interceptor, which writes TTL tier data to a local file for tools like this to consume.
*Built by Veritas Supera IT Solutions (VSITS LLC). We build AI-augmented
systems for technical teams — including the kind that know when to take a
break.*
Veritas Supera IT Solutions (VSITS LLC) builds AI-augmented systems for technical teams. If your organization is working with AI tooling and running into problems like these, let’s talk.