Claude Code Usage Limit Hit Too Fast: Why + 7 Fixes (2026)

Claude Code usage limit gone by lunch? Opus burns several times Sonnet, subagents 7x tokens, MCP eats 33% of context. See it with /usage, 7 fixes.

Claude Code Usage Limit Hit Too Fast: Why + 7 Fixes (2026)

You opened Claude Code at 9am, gave it a refactor, and by lunch it told you that you’ve hit your usage limit. On a paid plan. This is one of the most common complaints in the Claude Code issue tracker right now, and the cause is almost never a billing bug. It’s how the quota is structured, plus a few default behaviors that quietly burn through it.

The fastest way to drain a Claude Code plan is to run Opus on routine work while three subagents fan out behind you. Each of those is several multiples of the token spend you think you’re using.

This is a troubleshooting guide for the subscription plan limit: the “You’ve hit your usage limit” wall on Pro and Max. If you’re on an API key and getting a 429 Rate Limit Reached instead, that’s a different failure with different fixes, covered in Claude Code rate limit reached error: causes and fixes.

The 30-Second Diagnosis

Run two commands, then match your symptom to the cause below.

StepCommandWhat it tells you
1/usageConsumption against both the 5-hour session limit and the weekly limit(s)
2/contextWhat’s loaded into the current window. Watch the “MCP tools” line
SymptomMost likely causeFastest fix
Limit hit by midday, session bar highDefaulting to Opus on routine work/model to Sonnet
Session at 2% but “limit reached”Weekly cap exhausted, not sessionWait for 7-day reset or switch to API
Limit drains during agent runsSubagent fan-out (~7x tokens)Pin subagents to Sonnet/Haiku in frontmatter
Quota gone before you type muchMCP servers loading huge tool definitions/context, then trim unused servers
Bill or quota spikes overnightAuto-accept / background loopsCap effort, kill idle background tasks

If your usage is under control and you just want headroom now, jump to the escalation path. Everything between here and there is about making a plan last.

When to Fix This, When to Switch, and When to Stop

Not every limit is worth fighting, so set your strategy before you spend an afternoon optimizing. The right move depends entirely on whether you’re hitting the session cap or the weekly cap.

  • Fix it in-session when /usage shows the 5-hour session cap maxed during heavy bursts but the weekly bar still has room. Default to Sonnet, compact aggressively, and trim MCP. You’ll usually stay inside the session window after that.
  • Switch models when a single Opus-driven workflow is the whole problem. Pinning subagents and the default to Sonnet often doubles or triples how long the plan lasts, with no other change needed.
  • Switch billing when /usage shows the weekly all-models cap exhausted and the 7-day reset is days away. No in-session optimization brings a weekly cap back early. At that point you either upgrade a tier or move to pay-as-you-go.
  • Stop optimizing when you’ve trimmed MCP and routed models and /usage still empties fast. Your real volume has outgrown the plan, and the escalation path is the answer.

A quick gut check: if your error and retry rate is low and you’re only hitting the cap during occasional heavy sprints, the in-session fixes are enough. If you hit the wall every single day by mid-afternoon, you have a billing-model problem, not a hygiene problem.

Why the Limit Drains So Fast: Two Caps, Not One

The core reason people get surprised is that Claude Code enforces two independent limits, and the weekly one is invisible until it bites. The 5-hour limit is a rolling session window that starts with your first message and resets five hours later. The weekly limit is a separate 7-day cap, and on Max plans there are two of them: one across all models and one for Sonnet only, per Anthropic’s own usage docs.

These reset on different clocks. A heavy weekend can leave you session-capped on Monday with weekly headroom to spare, or weekly-capped with sessions to spare. Hitting a weekly cap locks usage until its 7-day reset even if your current five-hour session still has allowance. Waiting five hours does nothing.

One more thing that throws people: Anthropic stopped publishing fixed prompts-per-window and hours-per-week numbers. What it publishes now is relative capacity. Max 5x ($100/mo) gives 5x Pro’s per-session usage, and Max 20x ($200/mo) gives 20x, per the pricing breakdown at FrankX. So there’s no public token figure to budget against. You have to read /usage and learn your own ceiling.

flowchart TD
    A[First message] --> B[5-hour session window starts]
    B --> C{Session cap hit?}
    C -->|Yes| D[Locked until session resets in 5h]
    C -->|No| E{Weekly all-models cap hit?}
    E -->|Yes| F[Locked until 7-day reset<br/>even with session headroom]
    E -->|No| G{Sonnet-only weekly cap hit?<br/>Max plans only}
    G -->|Yes| H[Sonnet locked, other models may continue]
    G -->|No| I[Keep working]

If /usage shows your session low but you still hit a wall, it’s the weekly cap. There’s a known Claude Code issue where the session bar reads 2% while the limit fires at 32% weekly. The fix there is to trust the weekly number, not the session bar.

The Limit States You’ll See and What Each One Means

These are not HTTP error codes, they’re the limit states Claude Code surfaces. Reading them correctly tells you whether to wait, switch models, or switch billing.

State you seeWhat it meansScopeWhen it clears
”You’ve hit your usage limit”5-hour session cap reachedCurrent session, all modelsOn the rolling 5-hour reset
”Weekly limit reached”7-day all-models cap reachedEvery model, all sessionsOn the 7-day reset only
”Sonnet weekly limit reached” (Max)Sonnet-only 7-day cap reachedSonnet only, other models continueOn the 7-day reset only
”Limit reached” at low session %Almost always the weekly cap firingWeekly, not sessionOn the 7-day reset only
Session bar stuck at 100% on light useKnown display bugCosmetic, check /usage for truthRestart or trust /usage numbers

The trap is the fourth row. People see a fresh session bar, read “limit reached,” and file a bug. In nearly every case it’s the weekly cap, which the session bar doesn’t show. Run /usage and read the weekly line before assuming anything is broken.

Symptom, Cause, Fix

This table is the whole article in one place. Each cause maps to a fix section below.

SymptomCauseFix
Plan empties by lunchOpus is the default modelSet Sonnet as default with /model, reserve Opus for hard tasks
”Limit reached” at 2% sessionWeekly cap exhaustedRead weekly numbers in /usage, wait for reset or move to API
Drains fastest during agent workSubagents inherit Opus and run separate contexts (~7x)Route each subagent’s model in its frontmatter
Quota burns before real workMCP tool definitions ~33% of a 200k windowTrim servers, let MCP Tool Search defer-load
Context keeps re-billingCache prefix invalidated mid-sessionLock tools and model at start, /compact and /clear cleanly
Overnight spikesAuto-accept and background loops run unattendedSet per-prompt effort, kill idle background tasks

Fix 1: Default to Sonnet, Reserve Opus for the Hard Parts

Switching your default model is the single biggest lever. Opus costs several times more per turn than Sonnet, and Sonnet more than Haiku. At API rates that’s Opus 4.8 at $5/$25 per million input/output tokens, Sonnet 4.6 at $3/$15, and Haiku 4.5 at $1/$5, per FrankX pricing. The real-world per-turn gap runs wider than the sticker ratio once Opus’s heavier reasoning is counted.

Set Sonnet as your working default and pull Opus out only for the genuinely hard parts: architecture decisions, gnarly debugging, anything where a wrong answer costs you an hour. Use /model to switch. For most edits, refactors, and test writing, Sonnet’s output is hard to tell apart and it stretches your weekly cap several times further. The deeper mechanics of when Opus actually earns its cost are in our Claude Code token optimization guide.

One more setting worth changing: reasoning effort. Default reasoning burns roughly 2x the tokens of medium for most tasks. Set effort per prompt instead of leaving it on a global high, and reserve high effort for problems that genuinely need it.

Fix 2: Stop Subagents From Burning Opus in Parallel

Subagent fan-out is the quietest drain because it doesn’t show up while you’re typing. Each subagent runs its own API requests in a fresh context window. It doesn’t inherit your session, so it re-reads what it needs and bills its own calls. Agent teams can use about 7x the tokens of a standard session when teammates run in plan mode. One developer who blew through Max 20x found 85% of the usage came from subagent-heavy sessions.

The trap: most setups have every subagent inherit the main session’s model, which is usually Opus. So every worker pays Opus prices on tasks that don’t need Opus quality. Route each worker explicitly in its frontmatter:

---
name: test-writer
model: sonnet   # not the parent's Opus
---

Model routing alone cuts the subagent line item by roughly 30%. For mechanical work like renaming, repetitive edits, or doc lookups, drop those workers to Haiku. The pattern for splitting heavy planning from cheap execution is covered in our Claude Code hybrid routing pattern.

There’s a second, sneakier subagent cost: re-reads. Because a subagent starts cold, it re-reads files the parent already loaded. Keep subagent prompts narrow so they don’t re-scan half the repo. A worker told to “fix the failing test in auth_test.py” reads one file; a worker told to “improve test coverage” reads twenty.

Fix 3: Trim MCP Servers Before They Eat Your Window

MCP servers charge you before you do anything. Seven connected MCP servers can consume 67,300 tokens of tool definitions, which is 33.7% of a 200k context window, at session start. Each tool’s catalog runs 200 to 800 tokens of prose plus schema, multiplied across roughly 50 tools per server, per Async Let’s analysis. That overhead rides along on every turn, so it compounds against your weekly cap fast.

Two moves:

  1. Audit and disable. Run /context and look at the “MCP tools” line. Disable any server you haven’t used in two weeks. Use project-level config so only servers relevant to the current repo load.
  2. Let Tool Search defer. Claude Code’s MCP Tool Search (v2.1.7+) auto-defers tool loading when active MCP tool descriptions exceed 10% of the context budget. After it kicks in, the “MCP tools” line in /context should drop sharply. You can confirm it’s working right there.

If you run a handful of servers full-time, /context is the fastest audit you have. A common result is finding two or three servers you forgot you connected, each quietly costing you five figures of tokens per session.

Fix 4: Keep the Cache Warm With /compact and /clear Discipline

Context hygiene protects your prompt cache, and a warm cache is most of your savings. Prompt caching cuts cached input cost dramatically. Cache hit rates of ~90% are healthy on the 5-minute default and climb to ~97-99% on the 1-hour TTL, per the Product Compass cost guide. The thing that kills it: adding or removing a tool mid-session invalidates the cached prefix and forces a full re-read. Lock your tools and model at session start.

Then manage the window deliberately:

  • /compact at around 50% usage or after each discrete task, so old turns get summarized instead of re-sent in full on every turn.
  • /clear between unrelated pieces of work. Starting a fresh window beats dragging an hour of stale context into a new task.
  • Watch auto-accept and background loops. An unattended loop that keeps re-prompting can drain a session overnight while you’re asleep. Cap effort and kill idle background tasks before you walk away.

Fixes by Plan Tier

The same drains apply on every plan, but the right lever shifts as you move up tiers. Here’s where each tier should focus first.

Free / Pro Tier

On Pro you have the tightest weekly room, so model discipline matters most. Default to Sonnet, skip subagent fan-out entirely on a Pro plan (it’s the fastest way to empty a small cap), and run /compact early. Pro also can’t lean on Opus for routine work without paying for it twice over. If you hit the weekly wall here, you’re either due for Max or due for metered API billing for the heavy days.

Max 5x / Max 20x Tier

Max plans have the headroom to run subagents, but that’s exactly where Max users get burned. The single biggest Max-specific fix is routing every subagent’s model in frontmatter so they stop inheriting Opus. Max also carries the second, Sonnet-only weekly cap, so if you switch everything to Sonnet to save the all-models cap, watch the Sonnet line in /usage too. You can exhaust the Sonnet cap while the all-models cap still has room.

Team / Enterprise Tier

On seat-based plans, usage draws from a pool reset on a rolling window, per Anthropic’s docs. The fixes here are organizational: a shared model-routing convention so the whole team defaults to Sonnet, a trimmed MCP config checked into the repo so nobody loads ten servers, and a fallback API key for the days the pool runs dry. For teams that hit the pool ceiling regularly, a metered gateway gives you an overflow lane without renegotiating seats.

Common Failure Patterns We’ve Observed

There’s no public outage history for plan limits because this isn’t a service-down problem. It’s a set of repeating behaviors that drain quota faster than people expect. These are the patterns that show up again and again.

PatternWhat it looks likeWhy it drains fast
Opus-by-defaultPlan empties by early afternoon on routine editsOpus costs several times Sonnet per turn
Subagent fan-outQuota plunges during agent runs, fine while typingEach subagent runs its own context, ~7x tokens
MCP bloatQuota gone before real work startsTool definitions can be ~33% of the window at start
Cache thrashToken use stays high even on small turnsMid-session tool changes invalidate the cached prefix
Phantom limit”Limit reached” with a near-empty session barWeekly cap firing, sometimes a display bug
Overnight loopQuota gone by morning, no one was at the keyboardAuto-accept loop kept re-prompting unattended

Most “my limit is broken” reports map to one of these six rows. The phantom-limit pattern in particular accounts for a large share of confusion, because the session bar and the weekly cap are different numbers and only one of them is visible by default.

The June 2026 Billing Change: What Actually Happened

Nothing changed, but the scare was real and worth understanding. Anthropic announced that Agent SDK and claude -p usage would move off your subscription into a separate, dollar-denominated monthly credit billed at standard API rates: $20 for Pro, $100 for Max 5x, $200 for Max 20x, with no rollover.

Then, before the June 15 effective date, Anthropic paused it. Those surfaces still draw from your Pro and Max subscription limits exactly as before. There’s no credit to claim and your limits are unchanged. Anthropic says it’s reworking the plan and will give advance notice before any future revision.

What this means for you: don’t restructure your workflow around a credit pool that doesn’t exist yet. The fixes above target the limits that are actually live today.

Date (2026)ChangeStatus
May 65-hour limits doubled (Pro/Max/Team/Enterprise), peak-hour throttling removedLive
May 13Weekly limits raised 50% through July 13 (promo)Live, time-boxed
June 15Agent SDK and claude -p moved to a separate credit poolPaused, did not take effect

When the Plan Still Isn’t Enough: The Pay-As-You-Go Route

If /usage shows your weekly cap exhausted and you can’t wait for the reset, the only real escape from a plan ceiling is metered billing. Subscriptions are cheaper at daily-primary-tool volume. At that load the same tokens cost roughly 2 to 2.5x more at raw API rates than the $100 Max 5x flat fee. But subscriptions hard-stop you at the weekly cap, and pay-as-you-go has no weekly cap at all.

For bursty, headless, or unpredictable work, metered billing wins outright. An OpenAI-compatible gateway like ofox lets you point Claude Code (or any OpenAI-SDK client) at a single key, billed per token with no plan ceiling, and switch between Claude, GPT, and Gemini models without juggling provider accounts:

export ANTHROPIC_BASE_URL="https://api.ofox.ai/v1"
export ANTHROPIC_API_KEY="sk-ofox-..."
# Claude Code now bills per token, no weekly cap

In code, the model string is the same shape you’d use anywhere:

client.chat.completions.create(
    model="anthropic/claude-sonnet-4.6",  # one key, swap to opus/gpt/gemini freely
    messages=[{"role": "user", "content": "refactor this module"}],
)

The honest tradeoff: if you’re a daily heavy user inside one plan’s limits, keep the subscription. Metered billing earns its keep when your usage is spiky enough that you’d never reach a subscription’s monthly value, or when you keep slamming into the weekly wall and the reset is too far away to wait.

Alternatives When You’re Capped

When the weekly wall hits mid-task, here are the realistic ways forward, with ofox listed first because it’s the only option here with no weekly cap and many models behind one key.

OptionNo weekly capOne key, many modelsBest for
ofox API (pay-as-you-go)YesYes (Claude/GPT/Gemini)Bursty, headless, multi-model work, escaping the weekly wall
Anthropic Max 20xNoNo (Anthropic only)Daily heavy use that stays inside one plan
Direct Anthropic API keyYesNo (Anthropic only)Anthropic-only automation, CI jobs
Wait for resetn/an/aLight users near the end of a 7-day window

A plan limit isn’t a bug to file. It’s a budget you can read. Run /usage, default to Sonnet, and the wall you hit by lunch moves to the end of the week.

If you’re locked out mid-task and the reset is days away, the practical move is to keep the same workflow and change only where the tokens get billed. For the broader question of whether you’re looking at a limit or an outright error, and a safer default setup, see the Claude Code safe mode guide.

How to Monitor Your Usage Before You Hit the Wall

The point of all of this is to never be surprised again. Three commands give you everything you need, no external tool required.

  • /usage is your dashboard. Check it at the start of a session and again before any heavy agent run. Read both the session line and the weekly line, since the weekly one is the one that ambushes people.
  • /context shows what’s loaded right now. If the “MCP tools” line is large, you have a trimming opportunity before you’ve spent a single turn on real work.
  • /cost reports the current session’s dollar value at API rates, which is the fastest way to feel how expensive an Opus-heavy session really is.

Build the habit of a quick /usage and /context at session start. Two seconds of reading prevents the by-lunch lockout that brought you here.

FAQ

Why does my Claude Code usage limit get hit so fast? Usually Opus-by-default, subagent fan-out, or MCP bloat. Opus costs several times more per turn than Sonnet. Subagents run their own context windows at roughly 7x the tokens of a single thread. Seven MCP servers can eat a third of your window before you type. Run /usage and /context to find which one.

What’s the difference between the 5-hour limit and the weekly limit? The 5-hour limit is a rolling session window. The weekly limit is a 7-day cap, and Max plans have two of them (all-models and Sonnet-only). They reset on different clocks, so you can hit one with the other wide open.

Does hitting the weekly limit lock me out even if my 5-hour session is fresh? Yes. The weekly cap locks usage until its 7-day reset regardless of session headroom. Waiting five hours won’t help.

How do I check how much usage I have left? /usage shows session and weekly consumption, /context shows what’s loaded, and /cost shows the session’s dollar value at API rates.

Did Anthropic change Claude Code billing on June 15, 2026? No. The planned Agent SDK credit-pool change was paused before it took effect. Subscription limits are unchanged.

Will switching from Opus to Sonnet make my plan last longer? Significantly. Opus is several times more expensive per turn. Default to Sonnet with /model and reserve Opus for hard tasks.

Why does it say “usage limit reached” when my session is at 2%? That’s the weekly cap firing, not the session cap. Occasionally it’s a known display bug where the session bar jumps to 100% on low local usage. Trust the weekly numbers in /usage.

Can I use my Claude Pro plan and an API key at the same time? Yes. Many developers run on the subscription day to day and switch Claude Code to a metered API key on the days they exhaust the weekly cap, then switch back after the reset. The base URL and key are environment variables, so the swap is two lines.

Sources Checked for This Refresh