Claude Sonnet 5 in Cline: Setup, Thinking, vs Fable 5 (2026)

Set up Claude Sonnet 5 in Cline in 5 min: the right provider, the reasoning effort that controls cost, and when Sonnet 5 ($2/$10) beats Fable 5 ($10/$50).

Claude Sonnet 5 in Cline: Setup, Thinking, vs Fable 5 (2026)

Cline runs a lot of tokens. Every turn it resends your file tree, open buffers, and running task context, so the model you pick shows up on the bill fast. Claude Sonnet 5 is the model that makes that loop affordable without dropping to a weak model, and this guide sets it up in about five minutes.

Two things trip people up: which provider slot to use, and how the reasoning budget quietly controls both quality and cost. Both are covered below, along with the one decision that actually matters, when to pay 5x for Fable 5 instead.

What You Can Do After This Setup (And What You Can’t)

After this you will have Sonnet 5 driving Cline as a full agent: reading files, writing code, running commands, with prompt caching and extended thinking available. Here is the honest scope.

QuestionAnswer
Can Sonnet 5 act as a full Cline agent?Yes, with the Anthropic provider it gets native tool use.
Can I control reasoning depth?Yes, via the effort level (low/medium/high), not a token budget.
Can I switch to Fable 5 or Opus later?Yes, one Model ID field, no other change.
Does prompt caching apply?Yes on the Anthropic path; it cuts resent-context cost 10x.
Will this remove all rate limits?No. A gateway smooths provider limits but does not delete them.
Does OpenAI Compatible give the same features?Not fully; it can lose cache controls and native thinking.

Decision Frame: When to Run Sonnet 5 in Cline (and When Not)

Sonnet 5 is the default driver for Cline, not a compromise. But it is not the only option, and the wrong choice wastes money in both directions.

When to use Sonnet 5

  • Your Cline sessions are long and file-heavy, so token volume, not peak reasoning, sets the bill.
  • You want caching to blunt the cost of resending repo context every turn.
  • You need a capable agent for everyday edits, refactors, and boilerplate, which is most of the work.

When NOT to use it

  • The task reliably defeats Sonnet 5: deep multi-file refactors, gnarly concurrency bugs, or architecture calls where one wrong decision is expensive. That is Fable 5 or Opus 4.8 territory.
  • You are doing trivial file operations and simple edits at scale, where an even cheaper model would match the result.

The stop rule

If your goal is just to point Cline at a cheaper Claude endpoint, set the Anthropic provider, the base URL, and the Model ID, then stop. The reasoning and comparison sections are for people tuning cost against quality, not for the basic connect.

System Requirements

  • VS Code with the Cline extension installed from the marketplace, updated to a current release.
  • An API key for whichever backend serves the model. This guide uses ofox, an Anthropic-compatible gateway, so one key reaches Sonnet 5, Fable 5, and Opus 4.8.
  • Network reach to your endpoint. Behind a corporate TLS proxy, sort the certificate out first; the same rules as our Claude Code SSL certificate error guide apply to any Node-based tool.

Step-by-Step: Sonnet 5 in Cline

The whole setup is four fields and a test message. The only real decision is Step 1.

Step 1: Pick the provider slot

Cline offers two ways in. For Claude, the Anthropic provider is the right default.

Provider slotBase URLBest for
Anthropichttps://api.ofox.ai/anthropicClaude models, full native tool use, caching, thinking
OpenAI Compatiblehttps://api.ofox.ai/v1One slot serving Claude and non-Claude models together

The Anthropic provider speaks Claude’s native protocol, so Cline’s agent features work without a translation layer. Choose OpenAI Compatible only when you deliberately want one endpoint for mixed models and accept that cache controls and native thinking may not carry through.

Step 2: Open Cline settings and select the provider

Click the Cline icon in the VS Code Activity Bar, then the gear icon at the top of the panel. Under API Provider, select Anthropic (or OpenAI Compatible if that was your Step 1 choice).

Step 3: Set the base URL and key

Paste the base URL from the table into the Base URL field, and your API key into the API Key field.

Base URL: https://api.ofox.ai/anthropic
API Key:  sk-ofox-...

Expected result: the fields save and Cline stops warning about a missing key.

Step 4: Set the Model ID

Set the Model ID to the namespaced id, prefix included:

anthropic/claude-sonnet-5

A bare claude-sonnet-5 fails on a gateway, because the model catalog is namespaced by provider. To switch models later, change only this field; the base URL and key stay put. anthropic/claude-fable-5 and anthropic/claude-opus-4.8 are the two you will reach for most.

Step 5: Test the connection

Send a short message in the Cline chat, such as “list the files in this project.” If Cline reads the tree and replies, tool use is working and you are done with the basic setup.

Your first real task

A test message proves the wire is connected; a real task proves the agent loop works. Point Cline at something small and self-contained, for example “add input validation to the parseConfig function and a test for it.” Watch three things as it runs. It should read the relevant files on its own, propose a diff you approve before it writes, and run the test command when it finishes. If it reads and writes but never runs commands, the terminal integration is off, not the model; enable Cline’s command approval and retry. This first pass also tells you whether your default reasoning budget is right, which the next section covers.

Reasoning: The Effort Setting That Controls Cost

Sonnet 5 reasons through a problem in a separate pass before answering, and that pass is on by default (adaptive thinking). What you control is not a token count but a depth setting: Anthropic’s effort parameter, which takes low, medium, or high. The old budget_tokens knob is gone on Sonnet 5 — send it and the request returns a 400. In Cline you turn reasoning on in the model settings; if your Cline build still passes the legacy budget_tokens value, update Cline or switch it to the effort control, or Sonnet 5 will reject the call.

Effort is a cost dial, not a free upgrade. Reasoning tokens bill as output, and Sonnet 5 output is $10/M, so high effort on every trivial turn is money burned. Match the setting to the task.

Task typeSuggested effortWhy
Edits, boilerplate, file opsOff or lowLittle planning needed; keep turns cheap
Standard feature workMediumEnough planning without runaway cost
Hard refactors, tricky bugsHighDepth pays for itself when a wrong turn is expensive

The practical pattern is to keep effort low by default and raise it only for the turn that needs it. Cline lets you change it per session, so you are not locked into one setting for the whole project.

A concrete example: on a routine “rename this variable across the file” turn, high effort makes Sonnet 5 write a paragraph of reasoning nobody reads, and you pay output rates for it. On a “figure out why this async handler deadlocks” turn, that same high effort is what lets it trace the call graph instead of guessing. Same model, same price per token, wildly different value depending on whether the task needed the thinking. Watch the token counter Cline shows per turn for a day and you will calibrate the dial faster than any rule of thumb.

There is also a quality trap in the other direction. Turning thinking off entirely on a genuinely hard task does not save money, it just makes Sonnet 5 answer fast and wrong, and then you spend three correction turns cleaning up. Cheap-but-wrong is more expensive than the effort you skipped.

When Sonnet 5 Beats Fable 5 (and When It Doesn’t)

This is the decision that moves your bill. On ofox, the two models price like this:

ModelInputOutputCache readModel ID
Claude Sonnet 5$2/M$10/M$0.20/Manthropic/claude-sonnet-5
Claude Fable 5$10/M$50/M$1/Manthropic/claude-fable-5

Those Sonnet 5 rates are introductory, in effect through August 31, 2026; the standard rate afterward is $3/M input and $15/M output, which narrows the gap to about 3.3x. The current per-token prices match the ofox model pages; the introductory-versus-standard split and the August 31 cutoff come from Anthropic’s pricing docs.

During the introductory window Fable 5 costs 5x as much as Sonnet 5 on both input and output. Cline’s workload, resending context and generating diffs every turn, is exactly the high-token pattern where that rate gap decides the monthly total.

Do the math on a realistic session. Say a working session moves roughly 2M input and 200k output tokens across many turns. On Sonnet 5 that is about $4 input plus $2 output, near $6, and caching pulls the input side down further. On Fable 5 the same session is about $20 input plus $10 output, near $30 before caching. Run that daily and the difference is a rounding error for one dev and a real line item for a team. Scale it to five developers, twenty working days a month, and the default-model choice alone swings the monthly bill from roughly $600 on Sonnet 5 to roughly $3,000 on Fable 5, before caching pulls the Sonnet 5 number down further. That is the whole reason the default matters more than any single clever prompt.

So the rule is simple. Default to Sonnet 5. Escalate to Fable 5 only when Sonnet 5 actually fails the task: a large cross-file refactor it can’t hold in its head, a concurrency or type bug it keeps misreading, or an architecture decision where a wrong call costs more than the token premium ever will. For the everyday 80% of Cline work, Sonnet 5 matches the result at a fifth of the price. For a fuller head-to-head, see our Claude Fable 5 vs Sonnet 5 comparison, and for where Sonnet 5 sits against the older flagship, the Sonnet 5 vs Opus 4.8 breakdown.

Anthropic vs OpenAI-Compatible: The Full Difference

Step 1 said pick the Anthropic provider for Claude. Here is why, in detail, because the wrong slot silently drops features you paid for.

FeatureAnthropic providerOpenAI Compatible
Native tool use (file, terminal, edits)FullWorks, but through a translation layer
Prompt caching controlsExposedOften not surfaced
Extended thinking / effortNativeMay be flattened or ignored
Model ID formatanthropic/claude-sonnet-5anthropic/claude-sonnet-5
Base URL path/anthropic/v1
Best useClaude-only workflowsMixed Claude and non-Claude in one slot

The translation layer is the crux. The OpenAI Compatible slot maps Claude’s protocol onto the OpenAI shape, and anything without a clean equivalent, cache breakpoints and the effort/reasoning control in particular, may get lost in the mapping. For a Claude-only Cline setup that costs you the two features that most affect your bill and your hard-task quality. The only reason to accept that trade is a genuine need to run Claude and a non-Claude model through one identical slot without reconfiguring. If that is not you, take the native path. Which features survive the mapping also depends on your Cline version, so treat the two losses above as the likely case, not a fixed guarantee — if caching and reasoning matter to you, the Anthropic provider removes the question.

One nuance worth knowing: the Model ID is the same string on both slots, anthropic/claude-sonnet-5, because the gateway namespaces its catalog the same way regardless of protocol. What changes is only the base URL path and which features survive.

Watch Your Spend: Caching and the Token Math

Cline’s cost is not really about the model’s headline rate. It is about how many tokens you resend every turn, and whether they are cached. Cline rebuilds context each turn: the system prompt, your custom instructions, the file tree, and the open files. On a long session that same block goes out dozens of times.

This is what prompt caching is for. On the native Anthropic path, Sonnet 5 cache reads bill at $0.20/M against $2/M for fresh input, a 10x cut on the part of your context that does not change turn to turn. A stable system prompt and a fixed set of repo files, resent 40 times in a session, cost a tenth as much when they hit the cache.

Put rough numbers on a day of work:

ScenarioModelEst. session costNotes
Feature work, caching onSonnet 5~$4-6Cache absorbs most resent context
Same work, caching offSonnet 5~$8-10Full input rate every turn
Same workFable 5~$25-305x rate dominates the total
Hard task, high effortSonnet 5+$2-4Reasoning tokens billed as output

The takeaway is an ordering. First make sure caching is on, which means the Anthropic provider. Then keep Sonnet 5 as the default. Only then, for the genuinely hard turn, spend on thinking or step up to Fable 5. Get that order wrong, running Fable 5 by default with caching off, and you pay roughly five to eight times more for work Sonnet 5 would have done the same.

To confirm caching is actually working, watch Cline’s per-turn token readout: after the first turn of a session, the cached-input count should climb while fresh input stays small. If every turn shows full fresh input and zero cache, you are on a slot or path that dropped caching, back to the provider choice above.

Common Errors During Setup (and Fixes)

SymptomCauseFix
model not foundModel ID missing the anthropic/ prefixUse anthropic/claude-sonnet-5
401 UnauthorizedKey is for a different gateway, or blankPaste the key for the base URL you set
Tool use silently does nothingOpenAI-compatible slot dropped native toolsSwitch to the Anthropic provider
Cache never hitsWrong path, or caching unsupported on that slotUse /anthropic base URL and the Anthropic provider
SSL / self-signed cert errorCorporate TLS proxy re-signing trafficAdd the CA per the SSL guide linked above
Reasoning has no effectReasoning disabled, or effort set too lowEnable reasoning and raise the effort level

If a model ID resolves but responses feel truncated, check that Cline’s max-tokens setting is not clipping output before the reasoning pass and the answer both fit.

Switching Between Sonnet 5, Fable 5, and Opus

The escalation decision only pays off if switching is cheap, and in Cline it is. Because all three models live behind the same gateway and the same key, moving from Sonnet 5 to Fable 5 or Opus 4.8 is a single field: change the Model ID in provider settings and keep working.

anthropic/claude-sonnet-5   # default driver
anthropic/claude-fable-5    # escalate for the hard turn
anthropic/claude-opus-4.8   # the older flagship, if you want it

The workflow that keeps this cheap is to escalate a task, not a project. When Sonnet 5 stalls on a specific problem, switch to Fable 5 for that stretch, let it solve the thing, then switch back. Cline keeps the conversation and file context across the swap, so Fable 5 picks up where Sonnet 5 left off without re-reading the whole repo. Leaving the default on Fable 5 after the hard part is done is how a $6 session quietly becomes a $30 one.

A caveat on caching across a swap: the cache is per model, so the first turn after you switch pays full input rate to warm Fable 5’s cache, then cheapens again. That one warm-up turn is trivial next to solving a bug Sonnet 5 could not, but it is a reason not to flip models every other turn out of nervousness. Decide, escalate, finish, drop back.

If you find yourself escalating constantly, that is signal, not noise. Either your default reasoning budget on Sonnet 5 is too low and it is failing tasks it could handle with more thinking, or the work genuinely skews hard and Fable 5 should be the default for that project. Both are fixable once you notice the pattern in Cline’s per-turn readout.

Team / Multi-Developer Configuration

For a team, the win is one endpoint and one model policy instead of everyone wiring their own keys. Register a single gateway, hand each developer a key through your secret manager, and standardize the Cline provider settings so everyone routes Sonnet 5 through the same base URL. Billing lands in one place across Sonnet 5, Fable 5, and Opus 4.8, and switching a whole team’s default model is a one-line change to the shared Model ID rather than a fleet of individual reconfigurations.

The cost-control habit that pairs with this is model tiering: run the cheap default for the bulk of turns and escalate only the hard ones. The same logic behind our Claude Code hybrid routing pattern applies to Cline, and the mechanics of the endpoint swap are in the Cline API configuration guide and the broader custom API setup for Cursor, Claude Code, and Cline.

FAQ

How do I add Claude Sonnet 5 to Cline? Open Cline settings (gear icon), pick the Anthropic provider, set the Base URL to https://api.ofox.ai/anthropic, paste your key, and set the Model ID to anthropic/claude-sonnet-5. Send a test message.

What model ID does Cline use for Sonnet 5 through a gateway? anthropic/claude-sonnet-5, with the prefix. A bare name fails on a gateway; only Anthropic’s direct API takes it.

Should I use the Anthropic provider or OpenAI Compatible? Anthropic for Claude models, so you keep native tool use, caching, and thinking. OpenAI Compatible only when one slot must serve mixed models.

How do I turn on extended thinking for Sonnet 5? Enable reasoning in Cline. Depth is set by Anthropic’s effort parameter (low/medium/high), not a token budget; adaptive thinking is on by default and the old budget_tokens value returns a 400. Keep effort low for coding; reasoning tokens bill as output.

Is Sonnet 5 cheaper than Fable 5? Yes, 5x on both input and output ($2/$10 vs $10/$50 on ofox). For Cline’s high-token loops that gap sets the bill.

When is Fable 5 worth 5x? When Sonnet 5 fails the task outright: large refactors, subtle bugs, high-stakes architecture. For everyday work, Sonnet 5 matches it for a fifth of the cost.

Why do I get a 401 or model-not-found? Missing anthropic/ prefix, wrong base-URL path for the provider, or a key for a different gateway. Fix the prefix and match the base URL to the provider.

Does prompt caching work for Sonnet 5 in Cline? Yes on the Anthropic path, with cache reads at $0.20/M versus $2/M input. The OpenAI-compatible path may not surface cache controls.

Sources Checked for This Refresh

  • Cline VS Code API configuration guide, verified 2026-07-03. Source for the Anthropic vs OpenAI-compatible provider slots and the settings flow.
  • Anthropic extended thinking documentation, verified 2026-07-03. Source for adaptive thinking and the effort parameter, and for manual budget_tokens returning a 400 on Sonnet 5.
  • ofox model catalog snapshot, verified 2026-07-03. Source for the anthropic/claude-sonnet-5 and anthropic/claude-fable-5 model IDs and the current $2/$10 vs $10/$50 per-token pricing, including the $0.20/M vs $1/M cache-read rates.
  • Anthropic pricing documentation, verified 2026-07-03. Source for Sonnet 5’s introductory-versus-standard tiering: $2/$10 through August 31, 2026, then $3/$15.