How to Access GLM 5.2: Pricing, API Setup, and MIT Weights Plan (2026)
Zhipu just announced a frontier-class coding model with a 1M-token context window, MIT-licensed weights, and a $10/month entry price — all in one drop. The Z.ai Coding Plan API and the MIT weights both open the week of June 22, 2026. If you’ve been waiting for an open-weights Claude Code competitor that you can actually fork, the next seven days are when to read this guide, decide tier, and pre-stage your client config so you can wire it up on day one.
Why Now: The Reverse-Narrative Window Is Open
GLM 5.2 didn’t land in a vacuum. The 24 hours around its release are the reason this article exists — and the reason “should I switch?” is no longer hypothetical for some readers.
June 12, 2026 — Anthropic received an export-control directive from the US Department of Commerce restricting access to Claude Fable 5 and Mythos 5 for foreign nationals (inside or outside the US). The trigger was a security finding raised through Amazon: CEO Andy Jassy escalated jailbreak research to senior administration officials, including Treasury Secretary Scott Bessent (Fortune, Semafor). Rather than ship a US-only variant, Anthropic withdrew both models from public availability.
June 13, 2026 — same day the Anthropic withdrawal hit news cycles — Zhipu released GLM 5.2. Jie Tang (Tsinghua, GLM team lead) posted to X with the line “GLM-5.2 is Fully Open, Frontier Intelligence Belongs to Everyone”, framing the launch as a direct response: “the sudden restriction of certain frontier models is deeply regrettable… access to frontier models is abruptly cut off for non-technical reasons” (Jie Tang on X, June 13). The post traveled — roughly 898K views and Hacker News front page within 36 hours.
| Side | Move | Date |
|---|---|---|
| US (Commerce + Anthropic) | Export-control directive → Anthropic withdraws Fable 5 + Mythos 5 from public availability | Notice June 12, public June 13, 2026 |
| China (Zhipu) | Ships GLM 5.2 + announces MIT weights drop within 7 days | June 13, 2026 |
| Public signal | Jie Tang tweet — ~898K views, Hacker News front page | June 13–15, 2026 |
One nuance worth being precise about: Fable 5 was not deprecated, sunset, or retired by Anthropic. It was withdrawn after a US government export-control order, and Anthropic publicly disputed the severity of the underlying jailbreak finding that triggered the order (Tom’s Hardware). If you write or read about this elsewhere, “Anthropic shut down Fable” is the wrong framing.
For most readers this geopolitics is irrelevant — you pick a coding model on price and benchmarks. But three concrete things change in the next 30 days, and they decide whether the rest of this article is worth your time:
- Hedge value: if your team was on Claude Fable for coding workflows and you are outside the US, GLM 5.2 is the first frontier-class coding model that is licensed (MIT, weights drop next week) for you to fork and self-host. “Open weights as political insurance” is no longer abstract.
- Pricing pressure: open-weights frontier models put a ceiling on hosted subscription prices. Expect Anthropic, OpenAI, and Google to soften coding-plan tiers within ~60 days, regardless of whether GLM 5.2 benchmarks competitively.
- Tooling parity: Z.ai shipped Claude Code drop-in support on day one (the dedicated
/api/anthropicendpoint, covered in the Drop-in section below). The standard 2026 coding-CLI workflow no longer locks you into a single model family.
If none of those three apply, skip to the setup section. If any of them do, the rest of this article is the operational path: the 10-minute access flow you wire up once the Z.ai API opens the week of June 22, the drop-in replacement for Claude Code, and the self-host plan for when MIT weights drop the same week.
A Note on Availability (Read This First)
Zhipu’s June 13, 2026 launch was an announcement plus documentation, not a dashboard you can sign into the same day. Two access surfaces unlock on the next Z.ai release wave:
- Z.ai Coding Plan API — opens the week of June 22, 2026. Account creation, Coding Plan tier selection, API key issuance, and the
/api/anthropic+/api/coding/paas/v4endpoints all light up in that window. Until then, the endpoint URLs in this guide are the ones published in the launch post; treat them as provisional until you can hit them. - MIT-licensed open weights — drop the same week under
huggingface.co/zai-org/GLM-5.2. The HF repo is currently a placeholder; the config.json that confirms architecture and the BF16 / FP8 shards land on that calendar.
This guide is structured so you can do the planning work this week (pick a tier, pre-stage env vars, decide drop-in vs clean install) and execute setup in ~10 minutes the day the API turns on. If you need something working today, jump to the Alternatives section — ofox already serves DeepSeek V4 Pro / Kimi K2.6 / Qwen3 Coder Next on a single endpoint.
What You Get with GLM 5.2 (The 30-Second Answer)
| Item | Value |
|---|---|
| What you can do today (June 13–21, 2026) | Read this guide, pick a Coding Plan tier, pre-stage ~/.claude/settings.json or OPENAI_BASE_URL env, and queue a waitlist on z.ai if available |
| What you can do once API opens (week of June 22, 2026) | Use GLM 5.2 inside Claude Code, Cline, OpenCode, OpenClaw, Goose, Crush, Roo Code, or Kilo Code via Z.ai’s hosted Coding Plan; self-host the MIT-licensed weights from huggingface.co/zai-org (MoE; parameter count not officially confirmed for 5.2 — likely inherits the GLM-5 line’s 744B-total / 40B-active envelope) |
| Time to first call once keys go live | ~10 minutes (sign up → API key → CLI config → smoke test) |
| Minimum cost | ~$10/mo Lite tier; ~$30/mo Pro for ~2,000 prompts/week |
| What you need | A Z.ai account, an OpenAI-compatible coding client (or any tool that accepts a custom base_url), and 8 GB of patience for the first long-context call |
| What you can’t do (yet) | Quote SWE-bench numbers (Zhipu didn’t publish any), get a 5-tier thinking preset (only High and Max), or pull the weights through ofox (DeepSeek V4 Pro is the closest managed analog) |
Decision Frame: When GLM 5.2 Is Worth Your Setup Time
Use this section to bail out before reading the rest of the article.
When to use GLM 5.2
- You’re running multi-file refactors on a monolith and have been bumping into the 200K context ceiling of competing coding agents
- Your compliance team requires open, auditable model weights — MIT is one of the friendliest open-source licenses in the LLM space
- You want a Chinese-origin coding model to hedge against US-side access restrictions — GLM 5.2 launched the day Anthropic withdrew Claude Fable 5 + Mythos 5 after a US Commerce export-control directive (Why Now above has the full timeline)
When NOT to use GLM 5.2
- You need a model with published benchmarks before you ship it to a production team. Zhipu has not released SWE-bench, LiveCodeBench, or Aider numbers as of June 14, 2026 — independent benchmarks are days away at minimum
- You already pay for Claude Code with Sonnet/Opus and don’t have a specific gap GLM fills. Switching costs (tool config, prompt re-tuning, eval re-runs) are not worth the ~$10/mo savings unless context-window is the actual bottleneck
- You want a single managed endpoint that hands you GLM, GPT-5.5, and Claude Opus 4.8 behind one API key. GLM 5.2 isn’t on ofox yet (verified June 15, 2026) — if endpoint consolidation matters more than this specific model, see the Alternatives section
Stop rule
If you’ve never run into a 200K-token context limit on a real task in the last 30 days, you do not need GLM 5.2. Stop reading and revisit when Zhipu publishes a benchmark or ofox lists the model — whichever comes first.
System Requirements
Before you start the setup, confirm you have:
- A Z.ai account with payment method on file (Coding Plan billed monthly, USD or RMB)
- An OpenAI-compatible coding CLI — one of: Claude Code v0.x, Cline ≥ 3.x, OpenCode, Roo Code, Goose, Crush, OpenClaw, Kilo Code. Each supports a custom
base_urland model-name override - Network egress to
api.z.ai— verify withcurl -I https://api.z.ai/api/paas/v4/(you should get an HTTP response, not a connection error) - A side branch in your repo for the first run. Long-context coding agents are smart enough to delete unrelated files when given a vague prompt — never point one at
mainon your first day
If you want to self-host the weights when they drop the week of June 22, 2026, additional requirements:
- 8x H100 80GB GPUs or equivalent — this is an estimate based on the assumption that 5.2 inherits the GLM-5 line’s ~744B-total / ~40B-active MoE shape; Zhipu has not officially confirmed 5.2 parameter counts as of June 15, 2026. Re-size once
huggingface.co/zai-org/GLM-5.2/config.jsonlands. Expect community-built lower-VRAM forks within ~30 days of weights drop - vLLM or SGLang as the inference server (community examples will arrive on the HF repo; check
huggingface.co/zai-org/GLM-5.2once published) - Disk for the weight shards — estimate ~1.5 TB BF16 / ~860 GB FP8 if the GLM-5 lineage shape holds; treat as a planning placeholder, not a procurement number, until the HF repo confirms
Step-by-Step Setup (Hosted, ~10 Minutes — Once API Opens)
The Z.ai Coding Plan API opens the week of June 22, 2026. Steps 1–4 below execute in ~10 minutes the day the dashboard goes live; until then, you can pre-stage your CLI config (Step 3) and queue a waitlist on
z.aiif available.
flowchart LR
A[Sign up Z.ai] --> B[Pick Coding Plan tier]
B --> C[Generate API key]
C --> D[Configure CLI base_url + model]
D --> E[First smoke test]
E --> F[Wire repo, run a real task]
Step 1: Sign up for the Z.ai Coding Plan (once it opens)
Go to https://z.ai and create an account. Pick a Coding Plan tier:
| Tier | Approx. price | Quota | Best fit |
|---|---|---|---|
| Lite | ~$10/mo | ~400 prompts/week | Personal tinkering, light side projects |
| Pro | ~$30/mo | ~2,000 prompts/week | Solo dev, daily coding agent use |
| Max | ~$80/mo | ~8,000 prompts/week | Heavy agentic refactors, multi-hour autonomous runs |
| Team | Seat-based | Org-wide pooling | 3+ developers sharing quota |
Expected result: account dashboard with a “Coding Plan” entry showing your tier and remaining quota.
Step 2: Generate an API key
Inside the Z.ai dashboard, open API Keys → Create new key. Scope it to “Coding Plan” only — Z.ai exposes other paid endpoints (general chat, vision) that share your wallet but should not share the same key.
export ZAI_API_KEY="zai-..."
Expected result: a key starting with zai-. Drop it into your shell’s secrets file or 1Password — Z.ai shows the full key exactly once.
Step 3: Configure your coding CLI
Z.ai exposes two compat endpoints, and you pick the one that matches your client. Claude Code talks the Anthropic protocol; the other seven launch-day clients (Cline, OpenCode, Roo Code, Goose, Crush, OpenClaw, Kilo Code) speak the OpenAI chat-completions shape.
For Claude Code (Anthropic-compatible endpoint) — the minimal config is the shell-env or ~/.claude/settings.json env block covered in the Drop-in Replacement for Claude Code section below. That section also lists what carries over (CLAUDE.md, slash commands, subagents) and what changes (thinking presets, tool-result bridging) before you commit — read it before pasting the block.
For OpenAI-compatible clients (Cline, OpenCode, Roo Code, Goose, Crush, OpenClaw, Kilo Code)
export OPENAI_BASE_URL="https://api.z.ai/api/coding/paas/v4"
export OPENAI_API_KEY="$ZAI_API_KEY"
export OPENAI_MODEL="glm-5.2" # or "glm-5.2[1m]" for the 1M context window
Re-launch your CLI in the same shell and the new endpoint takes over. For clients that don’t read the OpenAI env vars, open the tool’s settings panel, pick “OpenAI Compatible” provider, and paste the same three values. Note that the Coding Plan uses a dedicated endpoint (/api/coding/paas/v4) distinct from Z.ai’s general per-token API (/api/paas/v4).
Python SDK smoke test (paste into any one-off REPL)
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.z.ai/api/coding/paas/v4",
api_key=os.environ["ZAI_API_KEY"],
)
resp = client.chat.completions.create(
model="glm-5.2[1m]",
messages=[{"role": "user", "content": "Refactor this function to async:\n\n" + open("handler.py").read()}],
)
print(resp.choices[0].message.content)
Expected result: a non-empty diff or refactored snippet within ~5 seconds for short input. For 1M-context calls expect 30-90 seconds to first token.
Step 4: First smoke test
Before pointing GLM 5.2 at your repo, run a sanity check that confirms (a) the key works, (b) you’re hitting the right model, (c) thinking mode is wired.
curl -s https://api.z.ai/api/coding/paas/v4/chat/completions \
-H "Authorization: Bearer $ZAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"glm-5.2[1m]","messages":[{"role":"user","content":"Reply with only the string OK if you are GLM 5.2."}],"max_tokens":16}' \
| jq -r '.choices[0].message.content'
Expected result: OK (or OK.). If you get a model-identity refusal or a different model name in the response, your config is wrong — see Common Errors below.
Drop-in Replacement for Claude Code (One-Block Swap)
If you are reading this article because Fable 5 went away — or because you have been thinking about migrating from Claude Code without rewriting your project setup — this is the section that matters most. Z.ai shipped a dedicated /api/anthropic endpoint on day one specifically so a Claude Code workspace becomes a GLM 5.2 workspace with one block of environment variables.
The one-block swap
Drop this into ~/.zshrc (or ~/.bashrc, or ~/.claude/settings.json under "env"), open a new shell, and relaunch claude:
# Drop-in swap: Claude Code workspace → GLM 5.2, no project rewrite
export ANTHROPIC_BASE_URL="https://api.z.ai/api/anthropic"
export ANTHROPIC_AUTH_TOKEN="$ZAI_API_KEY"
export ANTHROPIC_MODEL="glm-5.2[1m]" # 1M context; drop [1m] for default
export API_TIMEOUT_MS="3000000" # long-context calls take 30–90s
Claude Code’s UI will keep showing “Sonnet” / “Opus” labels because the client is not model-aware — server-side mapping at Z.ai routes the request to GLM 5.2. Your CLAUDE.md, project memory, slash commands, subagents, and harness habits keep working unchanged.
What carries over cleanly
- Project-level
CLAUDE.mdfiles and.claude/directories (commands, subagents, settings) - Slash commands and custom subagent definitions
AGENTS.mdfiles and Codex-style instruction layering (Claude Code reads these)- The Plan / Edit / Bash tool dispatcher behavior and its prompts
- Multi-file refactor workflows (1M context covers most monorepos in one request)
What changes (read this before you commit)
- Thinking budget: GLM 5.2 ships only “High” and “Max” presets — there is no equivalent of Claude’s
thinking_budget=autoheuristic. Pick one or accept High as the default. - Tool-result formatting: Claude expects
tool_resultblocks in a specific shape. Z.ai’s bridge translates 95%+ of common patterns, but occasionally drops nested-content blocks on long agentic loops. If you see assistant turns repeating a tool call instead of acknowledging it, that’s the failure mode — fall back to the OpenAI-compatible endpoint (/api/coding/paas/v4) and use Cline or OpenCode for that workflow. - Latency profile: first-token latency for 1M-context calls is 30–90 seconds, vs 5–15 for Claude on equivalent-size prompts. The
API_TIMEOUT_MS=3000000line above is mandatory, not optional — Claude Code defaults will kill the connection on long Plan-mode calls. - Quota model: you are now spending Coding Plan quota, not Claude Plan quota. The bursty agent-loop pattern that drains a Claude weekly cap in a few hours also drains a Lite-tier GLM plan; budget Pro or Max for sustained work.
Pick this path vs. a clean Cline setup
| Pick the drop-in swap if | Pick a clean Cline / OpenCode install if |
|---|---|
You have 3+ slash commands, tuned subagents, or a CLAUDE.md you’ve iterated on for weeks | You are starting a new project with no Claude Code investment |
| Your team standardized on Claude Code’s UI and switching tools means re-onboarding engineers | Your other tooling (lint, telemetry) speaks OpenAI-style requests |
| You want to A/B GLM 5.2 against your current Claude workflow without burning a sprint day | You hit the tool-result bridging issue above and the workaround is more friction than a tool switch |
Revert path (do this before you commit)
unset ANTHROPIC_BASE_URL ANTHROPIC_AUTH_TOKEN ANTHROPIC_MODEL and restart Claude Code. The claude CLI picks up Anthropic’s defaults again. No state inside your project is touched by the swap — it lives entirely in the shell environment.
Common Errors During Setup
| Error | Likely cause | Fix |
|---|---|---|
401 invalid_api_key | Key scoped to wrong product or pasted with whitespace | Regenerate with “Coding Plan” scope; copy-paste through a non-stripping clipboard |
model not found for glm-5.2 or glm-5.2[1m] | Z.ai uses glm-5.2 for the standard context window and the [1m] suffix as a model alias that switches the request to the 1M-context configuration | Use glm-5.2[1m] when you need the full 1M window; plain glm-5.2 for default-context calls. Both are valid model IDs against the Coding Plan endpoint |
429 Too Many Requests after a few minutes of work | Lite tier quota (~400 prompts/week) burned by an agent loop | Upgrade to Pro, or reduce agent iteration loops via max_iterations |
| Empty response body, no error | Thinking budget exceeded max_tokens | Raise max_tokens to ≥ 4096; thinking models stream reasoning then answer |
| Tool-use call returned as raw JSON in the assistant text | Z.ai’s OpenAI compat doesn’t auto-parse tool_use unless tools field present in request | Pass tools array even on the first turn; or use Anthropic-compatible endpoint if your client supports it |
| 504 / timeout on multi-file refactor | Long-context (>500K tokens) first-token latency can exceed default client timeout | Raise the CLI’s requestTimeoutMs to 600000 (10 min) for 1M-context calls |
Team / Multi-Developer Configuration
If 3+ developers need to share a quota, the Team tier of the Coding Plan does seat-based pooling — but the setup pattern differs from solo:
- One API key per developer, billed against the same org wallet — never share a single key across machines (it’s the fastest way to burn quota on something you can’t trace)
- A shared
.env.teamchecked into a private secrets repo, containing onlyOPENAI_BASE_URL=https://api.z.ai/api/coding/paas/v4andOPENAI_MODEL=glm-5.2[1m]— keep API keys out of git - A budget guard in CI: have your coding-agent CI step abort if the per-PR completion token count exceeds N (your call — start at 200K and adjust by Friday)
- Quota observability: Z.ai’s dashboard shows per-key usage; for programmatic polling, the Coding Plan exposes a quota endpoint at
https://api.z.ai/api/monitor/usage/quota/limitcovering the 5-hour token cycle, weekly quota, and monthly MCP usage — pull it into your existing observability stack (Datadog, Honeycomb)
If your org cannot route through a Chinese API endpoint (egress control, compliance), the practical pattern is to mirror the same OpenAI-compatible config against a different upstream — see Alternatives.
Advanced: The MIT Open-Weights Plan
Zhipu’s launch announcement commits the MIT-licensed weights for “next week” — meaning the week of June 22, 2026, the same window the Z.ai Coding Plan API opens. The HF org is huggingface.co/zai-org; track the GLM-5.2 repo for the actual drop.
What MIT actually buys you:
- Commercial use, modification, redistribution — no usage caps, no per-token fees once you’re hosting it
- Fine-tuning rights — you can train LoRAs or full fine-tunes on your own codebase and ship the result
- Forks — if Zhipu disables a feature you depend on (or, more likely, raises prices), community forks remain operable
What MIT does not buy you:
- A free lunch on inference compute — if 5.2 inherits the GLM-5 line’s ~744B-total / ~40B-active MoE shape (Zhipu hasn’t officially confirmed for 5.2), production throughput is still in 8x-H100 territory, with strong dependence on quantization quality
- Future model updates — the MIT release is point-in-time; GLM 5.3 may or may not be open
- Anthropic-quality safety tuning — Z.ai’s RLHF is its own house style, expect different refusal boundaries
The realistic path for most teams: stay on the hosted Coding Plan for the next 30-60 days, watch the community quantize the weights into 4-bit and 2-bit variants, then re-evaluate self-hosting once a single-node config exists.
Alternatives: Managed Open-Weights Coding Models on ofox
If you want a single OpenAI-compatible endpoint that already covers managed Chinese coding models — without waiting for the GLM 5.2 weights drop or building your own H100 cluster — ofox lists three solid alternatives as of June 15, 2026:
| Model | ofox API ID | Strength | When to pick over GLM 5.2 |
|---|---|---|---|
| DeepSeek V4 Pro | deepseek/deepseek-v4-pro | Coding-tuned flagship, broad community track record | You want a model with published benchmarks (DeepSeek’s are public; GLM 5.2’s aren’t yet) |
| Qwen3 Coder Next | bailian/qwen3-coder-next | Latest Alibaba coding-specific tier, multilingual code | You’re shipping to a multilingual Chinese/Japanese codebase and want first-party Qwen support |
| Kimi K2.6 | moonshotai/kimi-k2.6 | Long-context with strong recall | You need long-context that’s verified, not “claimed but un-benchmarked” |
You wire any of these in with the same config shape used for GLM 5.2 — just swap the base URL and model ID:
# Same Cline / OpenCode config, different upstream
export OPENAI_BASE_URL="https://api.ofox.ai/v1"
export OPENAI_MODEL="deepseek/deepseek-v4-pro"
This is the single-endpoint pattern: one key, many models, no per-vendor signup. See the ofox model catalog for current pricing and capability flags. When GLM 5.2 lands on ofox (it isn’t yet — verified June 15, 2026), you’ll switch by changing one string.
Watching Z.ai Status and Quota
Two things to wire in week one:
- Z.ai status page — bookmark it the day you sign up; a new product’s first 30 days always include at least one rate-limit tuning or quota-counting bug
- Per-PR usage instrumentation — log
usage.total_tokensfrom every API response into your existing PR-level telemetry (Datadog, Honeycomb, your call). Coding agents drift toward burning quota on rabbit-hole refactors, and the only way you’ll catch it is at the PR level
Sources Checked for This Refresh
- Codersera: “GLM 5.2 Just Launched: 1M Context, Coding-First, Open Weights Next Week (Day-One Brief)” — https://codersera.com/blog/glm-5-2-release-1m-context-coding-2026/ (verified June 15, 2026)
- AI Weekly: “Zhipu Deploys GLM 5.2 to All GLM Coding Plan Tiers With 1M-Token Context” — https://aiweekly.co/node/2946 (verified June 15, 2026)
- Agent-Wars: “Zhipu ships GLM 5.2 with a 1M-token context and no benchmarks” — https://agent-wars.com/news/2026-06-14-glm-5-2-million-token-context (verified June 15, 2026)
- ofox model catalog snapshot — https://ofox.io/en/models (verified June 15, 2026; GLM 5.2 not present, DeepSeek V4 Pro + Qwen3 Coder Next + Kimi K2.6 confirmed as managed alternatives)
- Hugging Face org for weights — https://huggingface.co/zai-org (
GLM-5.2repo pending publication as of June 15, 2026) - Jie Tang on X — “GLM-5.2 is Fully Open, Frontier Intelligence Belongs to Everyone” — https://x.com/jietang/status/2065784751345287314 (June 13, 2026; ~898K views by June 15)
- Fortune: “A warning from Amazon led the White House to shut down Anthropic’s Mythos model” — https://fortune.com/2026/06/14/how-a-warning-from-amazon-led-the-white-house-to-shut-down-anthropics-mythos-model/ (verified June 15, 2026)
- Semafor: “White House move to limit Anthropic linked to concerns about Chinese access to Mythos” — https://www.semafor.com/article/06/13/2026/white-house-move-to-limit-anthropic-linked-to-concerns-about-chinese-access-to-mythos (verified June 15, 2026)
- Tom’s Hardware: US government warned Anthropic that Fable 5 had been jailbroken — https://www.tomshardware.com/tech-industry/artificial-intelligence/trump-adviser-david-sacks-says-anthropic-refused-to-fix-fable-5-jailbreak-before-us-export-controls (verified June 15, 2026)
The thing that makes this drop different isn’t the million-token context — Anthropic and Google were already there. It’s that GLM 5.2 is the first frontier-class coding model where you can read the weights, audit the training license under MIT, and run a fork on your own metal — without giving up sub-second response times on the hosted plan while you migrate. The next 30 days will tell us whether the benchmarks back the marketing.


