Claude Fable 5 vs Sonnet 5 (2026): 5x Pricier, When It Pays
Fable 5 lists $10/$50, 5x Sonnet 5's $2/$10, but hits 80.3% SWE-bench Pro vs 63.2%. When the ceiling model pays, and how to route both on ofox.
TL;DR Fable 5 is Anthropic’s capability ceiling and Sonnet 5 is its value floor, and for the first time both are reachable through one endpoint. Fable 5 lists $10/$50 per million tokens, exactly 5x Sonnet 5’s introductory $2/$10 (3.3x after August 31). It earns that premium on the numbers: 80.3% on SWE-bench Pro against Sonnet 5’s 63.2%, and 91/100 on Every’s Senior Engineer test where Opus 4.8 scores 63. The catch is that the price gap is the floor, not the ceiling, because Fable 5’s always-on thinking emits more output tokens per task, and there is an availability catch too: Sonnet 5 is always listed, Fable 5 comes and goes in access windows. Below: the specs, the benchmark table, cost-per-solved-issue math, and a 10-line way to A/B both on your own traffic.
The 5x sticker gap is the smallest the difference ever gets. Fable 5’s thinking is always on, so on the same task it emits more output tokens than Sonnet 5, and output is the line that bills at $50.
TL;DR: Which One Should You Pick?
For most teams the answer is “Sonnet 5 as the default, Fable 5 for the hard tail you cannot afford to get wrong.” Here is the one-line verdict by scenario.
| Scenario | Pick | Why |
|---|---|---|
| Classification, extraction, chat, RAG answers | Sonnet 5 | Bounded output, capability is plenty, a fifth of the price |
| Routine coding: edits, refactors, test scaffolds | Sonnet 5 | 63.2% SWE-bench Pro clears everyday work |
| Frontier agentic coding where a failed patch is expensive | Fable 5 | 80.3% SWE-bench Pro, 91/100 senior-engineer test |
| Long-horizon autonomous runs that must land first try | Fable 5 | Fewer retries when correctness is the bottleneck |
| Cost-sensitive default across a mixed workload | Route both | Cheap work to Sonnet 5, the hard tail to Fable 5 |
| Cybersecurity, bio, or distillation work | Neither, use Opus 4.8 | Fable 5 auto-routes these to Opus 4.8 anyway |
The rest of this piece is the evidence behind that table, plus the honest version of “when does the $50 tier actually pay.”
What Changed: Fable 5 Came Back, Sonnet 5 Arrived
Two releases three weeks apart reset the top and the middle of the Claude line.
Claude Fable 5 shipped on June 9, 2026 as Anthropic’s first generally available Mythos-class model, the family Anthropic previously held back over cybersecurity capability. It is the Mythos model with three safety classifiers layered on top. Anthropic put it in Pro, Max, and Team subscription plans for two weeks, then removed it from those plans on June 23, leaving the API rate of $10/$50 as the way in. It has been rotating in and out of access windows since, which matters for how you architect around it.
Claude Sonnet 5 shipped on June 30, 2026 at introductory pricing of $2/$10 (standard $3/$15 after August 31). It is Anthropic’s most agentic Sonnet-tier model and the new default for professional work that is not at the frontier. We covered the head-to-head with the middle tier in Sonnet 5 vs Opus 4.8.
The reason to compare the two ends directly, rather than each against Opus 4.8, is that they answer different questions. Sonnet 5 answers “what is the cheapest model that clears my everyday bar.” Fable 5 answers “what is the best model money can buy when the task is hard enough that being wrong is the expensive outcome.” Most teams need both answers, and the interesting decision is where you draw the line between them. If you want the full three-way coding shootout with GPT-5.5 in the mix, that lives in Fable 5 vs Opus 4.8 vs GPT-5.5; this piece is narrower and more practical: two tiers, one routing decision.
Quick Specs Comparison
Both models share the same nominal 1M context window and 128K max output. The real differences are price, availability, and the fact that Fable 5 cannot turn thinking off.
| Spec | Claude Fable 5 | Claude Sonnet 5 |
|---|---|---|
| ofox model ID | anthropic/claude-fable-5 | anthropic/claude-sonnet-5 |
| Input | $10/M | $2/M (intro), $3/M (standard) |
| Output | $50/M | $10/M (intro), $15/M (standard) |
| Cached input read | $1/M (0.1x ratio) | $0.2/M |
| Context window | 1M | 1M |
| Max output | 128K | 128K |
| Thinking | Always on, cannot disable | Adaptive, on by default, can disable |
| Sampling params | 400 error | 400 error |
| Safety routing | Cyber / bio / distillation to Opus 4.8 | Real-time cyber refusals |
| ofox availability | Windowed, not always listed | Permanent listing |
The intro Sonnet 5 prices ($2/$10) and the cached read ($0.2/M) match the ofox model page for anthropic/claude-sonnet-5 as of July 2, 2026. Fable 5’s $10/$50 is the Anthropic API rate from Anthropic’s Fable 5 announcement; its cached read is the standard 0.1x-of-input ratio Anthropic applies across the line. Fable 5’s ofox listing was not live at the time of writing, so its numbers here are Anthropic-sourced, not read off a live ofox page. Check the ofox catalog for the current Fable 5 listing before you build against it.
The Price Gap, and Why It Is Bigger Than 5x
On per-token rates the gap is clean: Fable 5 is 5x Sonnet 5 during the introductory window, on input, output, and cached reads alike. After August 31, when Sonnet 5 moves to $3/$15, the multiple drops to about 3.3x. Either way, Sonnet 5 is dramatically cheaper per token.
The sticker understates the real difference for one structural reason. Fable 5’s thinking is always on and you cannot turn it off, so on any non-trivial task it produces a chunk of thinking and output tokens that a leaner call would not. Sonnet 5 has adaptive thinking on by default too, but you can dial it down with the effort parameter or disable it outright for bounded work. Output is the line that bills at $50/M on Fable 5 versus $10/M on Sonnet 5, so more output tokens on the pricier model widens the effective gap beyond the 5x sticker. This is the opposite of the Sonnet-versus-Opus story, where the cheaper model’s own thinking narrows the discount. Here the pricier model thinks harder by default, so the gap only grows.
Cached reads are the one place the ratio is a straight 5x with no asterisk. If your prompts carry a large stable prefix (a system prompt, a tool schema, a repeated document set), a cache read is $0.2/M on Sonnet 5 against $1/M on Fable 5. For a cache-heavy production endpoint, that line alone can dominate the monthly bill, and it never favors Fable 5.
Coding Benchmark: The Capability Gap Is Real
Benchmarks are noisy, but the gap between these two is wide enough to survive the noise. Here is where they land on the tests that map to production coding, with Opus 4.8 as the middle-tier reference.
| Benchmark | Fable 5 | Sonnet 5 | Opus 4.8 |
|---|---|---|---|
| SWE-bench Verified | 95.0% | n/a | 88.6% |
| SWE-bench Pro (agentic coding) | 80.3% | 63.2% | 69.2% |
| Every Senior Engineer (/100) | 91 | not published | 63 |
| Terminal-Bench 2.1 | 80.5% | n/a | 74.6% |
Two rows carry the decision.
SWE-bench Pro is the production read. It runs models against real GitHub issues end to end: read the repo, write a patch, the patch either passes the hidden test suite or it does not, no partial credit. Fable 5’s 80.3% against Sonnet 5’s 63.2% is a 17-point spread, and every one of those points is an issue that closes on the first run instead of failing. On a hard multi-file issue, a first-pass miss means a retry loop or a human picking up the pieces, and both cost more than tokens.
Every’s Senior Engineer benchmark is the ceiling read. Every runs it on the hardest problems they can write, the kind a senior engineer takes a working day to solve. Fable 5 at 91/100 lands in human-senior-engineer range. Opus 4.8 sits at 63. Anthropic has not published a Sonnet 5 figure for this test, but Sonnet 5 already trails Opus 4.8 on SWE-bench Pro (63.2% vs 69.2%), so on a harder benchmark it lands at or below Opus, not near Fable 5. That is the gap the price premium buys: not “a bit better on average,” but “can do a class of task the cheaper model mostly fails.” Treat these leaderboard scores as a snapshot and check Anthropic’s Transparency Hub for the per-benchmark source; the direction is what matters for routing, not the last decimal.
The honest summary of the table: for everyday coding, the extra points do not change the outcome, because Sonnet 5 already closes the issue. For frontier coding, the extra points are the difference between shipping and stalling.
Pricing Math: When the $50 Tier Actually Pays
Sticker price is one number, cost-per-solved-issue is another, and they can point in different directions. Here are two workloads with the assumptions stated so you can swap in your own.
Scenario A, an everyday coding fleet. 5 developers, 20 tasks/day each, 20 workdays (2,000 tasks/month). Per routine task: 40K input, and output of 8K on Sonnet 5 (thinking dialed low) versus 25K on Fable 5 (thinking always on). Assume the task is well within both models’ reach, so first-pass success is near 1 on both.
| Line | Sonnet 5 (intro) | Fable 5 |
|---|---|---|
| Input per task (40K) | $0.08 | $0.40 |
| Output per task | $0.08 (8K) | $1.25 (25K) |
| Cost per task | $0.16 | $1.65 |
| Monthly (2,000 tasks) | $320 | $3,300 |
| vs the other | baseline | ~10x more |
On routine work Fable 5 is not 5x more expensive, it is roughly 10x, because the always-on thinking piles onto the $50 output line. Paying that for work Sonnet 5 already closes is pure waste.
Scenario B, the hard tail. Now take genuinely hard, multi-file issues where first-pass success is the whole game. Use the SWE-bench Pro rates as a stand-in: 80.3% for Fable 5, 63.2% for Sonnet 5. Per attempt: 60K input, 40K output on Fable 5, 30K output on Sonnet 5.
| Line | Sonnet 5 (intro) | Fable 5 |
|---|---|---|
| Cost per attempt | $0.42 | $2.60 |
| First-pass success | 63.2% | 80.3% |
| Expected attempts to solve | ~1.58 | ~1.25 |
| Cost per solved issue (tokens only) | ~$0.66 | ~$3.24 |
On tokens alone, Sonnet 5 is still cheaper per solved issue even after retries, because a fifth of the per-attempt price buys a lot of retries. So the case for Fable 5 is not a token-cost case. It is this: the SWE-bench Pro rate flatters Sonnet 5 on the hardest tasks. On the class of problem Every’s benchmark targets (where Fable 5 scores 91 and Opus 4.8 only 63), Sonnet 5’s real-world solve rate falls well below its 63.2% headline, its retry count climbs, and some issues it never closes. Once a failed patch costs an hour of senior engineer time or ships a bug, the $3 token delta stops being the number that matters. That is when Fable 5 pays: not because it is cheaper, but because being wrong is expensive and it is wrong less often.
Put a number on it. A senior engineer at a loaded cost of $120/hour is $2/minute. If routing a hard issue to Fable 5 instead of Sonnet 5 saves even fifteen minutes of a human untangling a wrong patch, that is $30 of engineer time against a token delta measured in single dollars. The break-even is not close. The trap is applying that logic to the everyday 80%, where there is no wrong-patch cost to avoid because Sonnet 5 was going to close the issue anyway. The whole discipline of tiering is keeping the Fable 5 share small enough that its 10x effective cost lands only on the tasks where a saved engineer-hour is on the table. Size that share by measuring, not by taste: most teams find the genuine frontier is a single-digit percentage of their traffic, and everything above that percentage is money spent on capability the task did not require.
When to Pick Claude Sonnet 5
Pick anthropic/claude-sonnet-5 for the large majority of work:
- High-volume bounded output. Classification, extraction, routing, moderation. Short outputs, big input volume, often cache-heavy. Sonnet 5’s $2/$10 and $0.2/M cached reads cut these bills to a fraction of Fable 5’s.
- RAG answers and summarization. Retrieval does the heavy lifting; the model writes a bounded response. Capability is plenty.
- Routine coding. Single-file edits, boilerplate, test scaffolds, review comments. 63.2% SWE-bench Pro clears work that is not at the frontier.
- Anything latency-sensitive and interactive. Sonnet-tier speed and price fit chat and assistant surfaces better than a ceiling model that always thinks first.
When to Pick Claude Fable 5
Pick anthropic/claude-fable-5 when the task is at the capability frontier and a wrong answer is the expensive outcome:
- Frontier agentic coding. Hard, multi-file issues where the 17-point SWE-bench Pro lead is the difference between one run and a retry loop, and where a shipped-wrong patch costs real engineer time.
- Long-horizon autonomous runs. Overnight refactors and multi-step agent loops that have to hold together without a human catching a wrong turn on step 12.
- Senior-engineer-class problems. The work Every’s benchmark targets, where Sonnet 5’s real solve rate drops and Fable 5’s 91/100 is the reason to reach for it.
- When you have access. Fable 5’s availability is windowed, so architect it as the tier you route to when it is live, not a permanent dependency.
When Not to Pick Either (and What to Use Instead)
Two cases fall between the tiers.
The first is cybersecurity, biology and chemistry, or model distillation work. Fable 5 detects these and routes them to Opus 4.8 anyway, so calling Fable 5 for them just adds a routing hop. Call anthropic/claude-opus-4.8 directly and skip it.
The second is the middle of the difficulty range, the tasks that are too hard for Sonnet 5 to close reliably but not hard enough to justify Fable 5’s 10x effective cost. That is exactly where Opus 4.8 lives: $5/$25, 69.2% SWE-bench Pro, and no availability window to plan around. For a lot of teams the real routing tree has three tiers, not two, with Opus 4.8 as the everyday-hard workhorse and Fable 5 reserved for the genuine frontier. The Sonnet 5 vs Opus 4.8 breakdown covers the lower boundary; the Opus 4.8 release review covers the middle.
flowchart TD
A[Incoming task] --> B{Cyber / bio / distillation?}
B -->|Yes| C[anthropic/claude-opus-4.8]
B -->|No| D{Frontier-hard?<br/>failed answer is expensive}
D -->|No| E[anthropic/claude-sonnet-5]
D -->|Yes| F{Fable 5 in an access window?}
F -->|Yes| G[anthropic/claude-fable-5]
F -->|No| H[anthropic/claude-opus-4.8]
Try Both via ofox: A/B in 10 Lines
The honest way to settle the routing line is to run both on your own tasks and read the token counts. ofox exposes the Claude line on one OpenAI-compatible endpoint (https://api.ofox.ai/v1), so the only thing that changes between runs is the model ID string, and one key covers all three tiers with no separate Anthropic billing. Two gotchas before you run it: both models reject non-default temperature, top_p, and top_k with a 400, so leave sampling params alone (the examples do). And Fable 5 must be live in an ofox access window for its line to resolve; when it is not listed, either wait for the window or point that one call at Anthropic’s own API.
Python: A/B both models in one loop
from openai import OpenAI
client = OpenAI(base_url="https://api.ofox.ai/v1", api_key="YOUR_OFOX_KEY")
prompt = "Fix the race condition in this worker pool: ..."
for model in ["anthropic/claude-fable-5", "anthropic/claude-sonnet-5"]:
r = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
)
u = r.usage
print(model, u.prompt_tokens, u.completion_tokens)
Watch the completion_tokens column. Fable 5’s always-on thinking shows up there, and multiplied by $50/M it is where the effective cost gap lives.
Node: same shape
import OpenAI from "openai";
const client = new OpenAI({ baseURL: "https://api.ofox.ai/v1", apiKey: process.env.OFOX_KEY });
const prompt = "Fix the race condition in this worker pool: ...";
for (const model of ["anthropic/claude-fable-5", "anthropic/claude-sonnet-5"]) {
const r = await client.chat.completions.create({
model,
messages: [{ role: "user", content: prompt }],
});
console.log(model, r.usage.prompt_tokens, r.usage.completion_tokens);
}
Run this on 20 or 30 of your genuinely hard tasks, sum input and output tokens per model, multiply by the specs-table rates, and divide by how many each model actually solved. That solved-issue cost, not the sticker, is the number that decides where the routing line goes. For the routing plumbing itself, the Claude Code hybrid routing pattern writeup covers picking the signal (input length, task-type tag, or a confidence check that escalates only on failure).
Migration Gotchas: Same Shape, Three 400s
Both models keep the Messages API shape, but the same request that worked on an older Claude can 400 on either of these.
| Change | Old behavior | On Fable 5 / Sonnet 5 |
|---|---|---|
| Sampling params | temperature / top_p / top_k accepted | Non-default values return 400 on both |
| Manual thinking | budget_tokens accepted on some models | Returns 400 on both; use effort |
| Disable thinking | thinking: {type: "disabled"} accepted | Works on Sonnet 5; 400 on Fable 5 (omit the param) |
| Refusals | thrown as errors | HTTP 200 with stop_reason: "refusal" on both; handle it |
The Fable 5 rows are the ones that trip people. Thinking is always on, so there is no disable switch, and the safety classifiers can hand a request to Opus 4.8 mid-flight. On the API, opt into a fallback so a refusal does not just stop the request; Anthropic’s server-side fallbacks parameter re-serves a declined request on Opus 4.8 in the same call. If you are moving a Sonnet 5 workload up to Fable 5 for the hard tail, budget for more output tokens per task, not fewer, because the always-on thinking works against the intuition that a smarter model finishes faster.
The routing test is not the benchmark score, it is cost-per-solved-issue: run both on your real hard tasks, count the tokens, and count how many each one actually closed.
Alternatives
- ofox puts Sonnet 5, Opus 4.8, and Fable 5 (when in-window) on one OpenAI-compatible endpoint, so routing between tiers is a one-string change rather than three integrations. Real-time pricing is on the model catalog.
- Opus 4.8 is the middle tier worth naming explicitly: $5/$25, 69.2% SWE-bench Pro, always available, no window to plan around. For the tasks between Sonnet 5’s ceiling and Fable 5’s floor, it is often the right pick.
- Anthropic direct is the fallback for Fable 5 specifically. When Fable 5 is not listed on an aggregator, its own API keeps the $10/$50 rate available, at the cost of a second key and separate billing.
FAQ
Is Claude Fable 5 worth 5x the price of Sonnet 5? Only for the hardest tasks. Fable 5 buys a real capability jump (80.3% SWE-bench Pro vs 63.2%, and 91/100 on Every’s Senior Engineer test where Opus 4.8 scores 63), but on cost-per-solved-issue Sonnet 5 stays cheaper even after retries. Fable 5 pays when a wrong first answer costs more than the token difference.
How much does Claude Fable 5 cost compared to Sonnet 5? $10/$50 per million tokens versus Sonnet 5’s $2/$10 intro ($3/$15 standard). That is 5x during the intro window, about 3.3x after August 31. Cached reads are $1/M vs $0.2/M.
Is Claude Fable 5 available on ofox? Intermittently. Sonnet 5 is a permanent listing at anthropic/claude-sonnet-5; Fable 5 is offered in access windows, so confirm it is live on the ofox catalog before building against it.
Is Fable 5 better than Sonnet 5 for coding? At the frontier, clearly (80.3% SWE-bench Pro, 91/100 senior-engineer test). For routine coding, Sonnet 5 is already enough at a fifth of the cost.
Why does Fable 5 refuse or route to Opus 4.8? Its safety classifiers hand cybersecurity, bio, and distillation requests to Opus 4.8. A refusal returns as HTTP 200 with stop_reason: "refusal", so check the stop reason before reading content.
Can I set temperature on Fable 5 or Sonnet 5? No. Non-default sampling params 400 on both, as does budget_tokens. Fable 5 also 400s on thinking: {type: "disabled"} because thinking is always on.
What is the context window of Fable 5 and Sonnet 5? Both are 1M tokens, 128K max output. For this choice the window is a wash; price and capability decide it.
Should I switch from Sonnet 5 to Fable 5? Not wholesale. Keep Sonnet 5 as the default and escalate to Fable 5 only when Sonnet 5’s output fails a check. Wholesale switching pays 5x for capability most requests do not need.
Sources Checked for This Refresh
- Anthropic, “Claude Fable 5 and Mythos 5” announcement (Fable 5 $10/$50, Mythos-class, safety classifiers routing to Opus 4.8), verified July 2, 2026: https://www.anthropic.com/news/claude-fable-5-mythos-5
- Anthropic, “Introducing Claude Fable 5” docs (always-on thinking, no sampling params, refusal handling): https://platform.claude.com/docs/en/about-claude/models/introducing-claude-fable-5
- Anthropic, “Introducing Claude Sonnet 5” launch post, June 30, 2026: https://www.anthropic.com/news/claude-sonnet-5
- Anthropic, “What’s new in Claude Sonnet 5” docs (behavior changes, pricing): https://platform.claude.com/docs/en/about-claude/models/whats-new-sonnet-5
- Anthropic Transparency Hub (per-benchmark source): https://www.anthropic.com/transparency
- ofox model page for
anthropic/claude-sonnet-5($2/$10 intro, $0.2/M cached read, 1M context), verified July 2, 2026: https://ofox.io/models/anthropic/claude-sonnet-5 - ofox model catalog (Fable 5 listing status, real-time pricing), checked July 2, 2026: https://ofox.io/models
- SWE-bench Pro / SWE-bench Verified / Every Senior Engineer figures from Anthropic launch materials and Every’s published benchmark, cross-referenced with our own Fable 5 vs Opus 4.8 vs GPT-5.5 writeup


