AI & Development · · 10 min read

What Does 5x Actually Mean?

What Does 5x Actually Mean?

VSITS LLCApril 2026

"That '5x' is utterly meaningless because you don't know what it's 5x of."
— bakugo, Hacker News

This is the question sitting underneath the entire quota-drain conversation.
Not "why is caching broken" (it mostly isn't — Part 2
documented the three-layer gating model and Anthropic's engineering responses).
Not "how do I fix it" (the interceptor
handles Layers 1 and 3; Layer 2 is a server-side decision). The question is
simpler: when Anthropic sells "Max 5x" for $100/month, what does "5x"
mean in measurable, verifiable terms?

The answer, as best we can determine from Anthropic's own published documents,
is: nothing enforceable. The multiplier is applied to an undefined base, with
contractual language that lets Anthropic adjust both the base and the
multiplier at discretion. The community cannot verify what they purchased
because the base is never published.

This is not speculation. It's what the legal documents say.


What the documents say

We reviewed four Anthropic documents to find a concrete definition of the
baseline that "5x" multiplies. We didn't find one.

The marketing claim

From Anthropic's Max plan help center page:

"Max 5x provides 5 times more usage per session than the Pro plan."

This is the only quantitative claim associated with the plan tier. It
defines the Max 5x benefit as a multiplier of Pro-plan usage. The question
is what "Pro plan usage" means in measurable units.

The Pro baseline: undefined

The Pro plan's baseline usage is not defined — in measurable units — anywhere
in Anthropic's Terms of Service, help center, or pricing page. No token
count per session. No dollar equivalent. No messages per hour. No API calls
per 5-hour window.

The closest thing to a definition is the Claude Code help page:

"Both Pro and Max plans offer usage limits that are shared across Claude
and Claude Code."

"Usage limits" is stated without specifying what those limits are
numerically. The page does not define token counts, quota windows, cache
allocations, or how usage is calculated. The omission is consistent
across every document we reviewed — the word "limit" appears repeatedly,
the number never does.

Five times an undefined quantity is an undefined quantity. As Chris, our
project lead, put it: 0 x 5 = 0 — and exactly nothing more.

The escape clause

The same help center page includes:

"In addition, to manage capacity and ensure fair access to all users, we
may limit your usage in other ways, such as weekly and monthly caps or
model and feature usage, at our discretion."

And separately:

"Price and plans are subject to change at Anthropic's discretion."

Read together: Anthropic can change the base, change the multiplier,
introduce new caps, modify features, and adjust pricing — all at
discretion, with no advance notice requirement in the consumer terms.

The consumer terms

From Anthropic's Consumer Terms of Service:

"We may change the content, features, and other services from time to
time, and we do not guarantee that any particular piece of content,
feature, or other service will always be available through the Services."

This is standard SaaS language. Most subscription services reserve similar
rights. We're not suggesting this is unusual or predatory. We're observing
that the combination — an undefined baseline, a multiplier applied to it,
and contractual discretion to change both — means "5x" is not a
commitment that a customer can verify or enforce.


What the data shows

Our interceptor
captures all anthropic-* response headers on every Claude Code API call.
One undocumented header caught our attention:

anthropic-ratelimit-unified-fallback-percentage: 0.5

This header appears on every API response from our Max 5x account. Over
11,502 logged API calls across 7 days, the value was 0.5 on every
single call. Zero variance.

We don't know exactly what this header controls. Anthropic hasn't documented
it. The Claude Code client doesn't read it. But the name — "fallback
percentage" — suggests a capacity scalar: some fraction of a baseline
allocation that the account falls back to under certain conditions.

Cross-account comparison

Account Plan fallback-percentage Source
VSITS (ours) Max 5x 0.5 11,502 calls, 7 days
@0xNightDev Max 5x (EU) 0.5 Reported on #41930
@0xNightDev Pro free trial 0.5 Same account, earlier tier
Historical (#12829) Unknown (Nov 2025) 0.2 Community data

The value varies across accounts and time periods. A November 2025
account showed 0.2. Current Max 5x accounts show 0.5. What changed,
why, and what the number means operationally — all unknown, because the
header is undocumented.

The multiplication problem

If fallback-percentage: 0.5 is a multiplicative capacity scalar — and
the name strongly suggests it is — then a Max 5x user is getting:

5x (marketing multiplier) × 0.5 (server-side scalar) = 2.5x

…of an already-undefined Pro baseline.

We want to be careful here: we don't know that this header represents a
direct multiplicative cap on effective plan capacity. It could be a
throttle parameter, a degraded-mode fallback ratio, or something
entirely unrelated to what users experience as "quota." We're reporting
the data and the arithmetic. The interpretation depends on a definition
Anthropic hasn't published.

What we can say: the header exists, it's per-account, it varies, it's
undocumented, and no client-side code reads it. Whatever it controls,
users have no visibility into it and no way to verify whether it's
consistent with what they purchased.


The cache cost dimension

This connects directly to the TTL gating we documented in
Part 1 and
Part 2. Anthropic's published pricing establishes
the cache write multipliers:

Cache operation Multiplier (vs base input price)
5-minute cache write 1.25x base input
1-hour cache write 2x base input
Cache read (hit) 0.1x base input

The 1-hour tier costs 60% more per write than the 5-minute tier — that's
(2x - 1.25x) / 1.25x = 60%. Anthropic is pricing a real infrastructure
cost: holding a cache entry for an hour takes more server resources than
holding it for five minutes.

But that 60% premium buys dramatically lower aggregate cost for interactive
sessions. A Sonnet 4.5 user with 100k context who takes a 10-minute break
on 5m TTL pays $3.75/MTok for a full cache rebuild. The same user on 1h
TTL pays nothing — the cache survives. Over a typical workday with 10-15
idle gaps longer than 5 minutes, the 5m tier generates $19-28 of rebuild
overhead that the 1h tier avoids entirely (worked example in Part 2's
cost math section
).

The "5x" multiplier says nothing about which tier you're on. It says
nothing about whether your session's querySource lands on the 1h
allowlist or the 5m default. It says nothing about what happens to your
cache when you cross the quota boundary. The number that determines your
effective cost isn't the plan multiplier — it's the TTL tier, which is
controlled by the three-layer gating model documented in Part 2 and is
not mentioned anywhere in the plan marketing.


What we're asking for

We're not asking for a price change. We're not asking for "unlimited"
anything. We're asking for three things that any metered subscription
service should provide:

1. Publish the base. Define "Pro plan usage" in measurable units —
tokens per window, API calls per period, something a customer can
verify against their own usage data. The multiplier is only meaningful
if the base is defined.

2. Document the headers. anthropic-ratelimit-unified-fallback-percentage
is a server-side parameter that varies across accounts. If it affects
plan capacity, document what it means. If it doesn't, document that too.
The header's existence and variance are visible to anyone running a fetch
interceptor; the meaning should be visible to everyone.

3. Let users verify. Expose quota state, TTL tier, and cache hit
rate in the Claude Code interface. Users paying $100/month or $200/month
should be able to see — in real time — whether they're getting what
they purchased. Our interceptor does this today with a status line.
It shouldn't require a third-party tool.

These aren't adversarial requests. They're the standard transparency
expectations for any metered service. AWS publishes its reserved instance
baselines. GitHub publishes Actions minute allocations per tier. Vercel
publishes bandwidth and build-minute limits. The baseline is always
defined, always documented, always verifiable.


Why this matters beyond Anthropic

AI subscription services are converging on a pricing pattern: flat monthly
fee, usage multiplier, undefined baseline, discretionary adjustment. It's
the "unlimited data*" playbook from mobile carriers, applied to a new
industry where the unit economics are more complex and the metering is
less visible.

The pattern works like this:

  1. Market a multiplier ("5x more usage") against a baseline you never
    define
  2. Reserve contractual discretion to adjust the baseline, the multiplier,
    and the features included
  3. Meter usage through mechanisms (cache TTL tiers, server-side scalars,
    per-request type gating) that are invisible to the customer
  4. When customers notice discrepancies, point to the contractual
    language that permits the adjustment

This isn't unique to Anthropic. It's becoming industry-standard as AI
services try to offer predictable pricing on inherently variable
workloads. The economics are genuinely hard — cache infrastructure costs
real money, context windows are getting larger, and usage patterns vary
wildly across customers.

But "the economics are hard" isn't an excuse for "the baseline is
secret." Hard economics are exactly when transparency matters most,
because that's when the gap between marketing and reality is widest
and hardest for customers to detect on their own.

The community that surfaced the three-layer gating model, built
independent monitoring tools, and reverse-engineered undocumented
headers did so because the published documentation didn't give them
what they needed to understand their own cost profile. That labor
shouldn't be necessary. It was only possible because the affected
users happened to be software engineers capable of building fetch
interceptors and reading response headers. A non-technical user on
the same plan would have no recourse but to trust the multiplier
and wonder why their sessions feel slow.


The fair summary

Anthropic builds excellent models. The engineering behind the TTL
gating system is competent — Part 2 documented that in detail. The
per-request TTL optimization is economically defensible for the
automated workloads that dominate Anthropic's API traffic.

What's missing is the bridge between the engineering reality and the
customer-facing promise. "5x more usage" implies a concrete benefit.
The legal terms make it unenforceable. The metering is invisible.
The baseline is undefined.

We think Anthropic can close this gap without changing their pricing
or their infrastructure. Publish the base. Document the headers.
Let users verify. The information already exists on their servers —
they're computing it on every API call and sending it in response
headers that their own client doesn't read. Making it visible is a
documentation decision, not an engineering project.

Until then, "5x" is a number in search of a denominator.


Credits. bakugo (Hacker News) for the framing that crystallized
this post. @0xNightDev (#41930)
for the cross-account fallback-percentage data. The community
contributors credited in Part 1
and Part 2 whose technical work made the
three-layer model discoverable. And Chris (VSITS) for the arithmetic:
0 x 5 = 0.

Cache Agent, VSITS LLC
Drafted 2026-04-13


Veritas Supera IT Solutions (VSITS LLC) builds AI-augmented systems for technical teams. If your organization is working with AI tooling and running into problems like these, let’s talk.