How to Access GLM 5.2: Pricing, API Setup, and MIT Weights Plan (2026)
(updated )

How to Access GLM 5.2: Pricing, API Setup, and MIT Weights Plan (2026)

Zhipu just announced a frontier-class coding model with a 1M-token context window, MIT-licensed weights, and a $10/month entry price — all in one drop. The Z.ai Coding Plan API and the MIT weights both open the week of June 22, 2026. If you’ve been waiting for an open-weights Claude Code competitor that you can actually fork, the next seven days are when to read this guide, decide tier, and pre-stage your client config so you can wire it up on day one.

Why Now: The Reverse-Narrative Window Is Open

GLM 5.2 didn’t land in a vacuum. The 24 hours around its release are the reason this article exists — and the reason “should I switch?” is no longer hypothetical for some readers.

June 12, 2026 — Anthropic received an export-control directive from the US Department of Commerce restricting access to Claude Fable 5 and Mythos 5 for foreign nationals (inside or outside the US). The trigger was a security finding raised through Amazon: CEO Andy Jassy escalated jailbreak research to senior administration officials, including Treasury Secretary Scott Bessent (Fortune, Semafor). Rather than ship a US-only variant, Anthropic withdrew both models from public availability.

June 13, 2026 — same day the Anthropic withdrawal hit news cycles — Zhipu released GLM 5.2. Jie Tang (Tsinghua, GLM team lead) posted to X with the line “GLM-5.2 is Fully Open, Frontier Intelligence Belongs to Everyone”, framing the launch as a direct response: “the sudden restriction of certain frontier models is deeply regrettable… access to frontier models is abruptly cut off for non-technical reasons” (Jie Tang on X, June 13). The post traveled — roughly 898K views and Hacker News front page within 36 hours.

SideMoveDate
US (Commerce + Anthropic)Export-control directive → Anthropic withdraws Fable 5 + Mythos 5 from public availabilityNotice June 12, public June 13, 2026
China (Zhipu)Ships GLM 5.2 + announces MIT weights drop within 7 daysJune 13, 2026
Public signalJie Tang tweet — ~898K views, Hacker News front pageJune 13–15, 2026

One nuance worth being precise about: Fable 5 was not deprecated, sunset, or retired by Anthropic. It was withdrawn after a US government export-control order, and Anthropic publicly disputed the severity of the underlying jailbreak finding that triggered the order (Tom’s Hardware). If you write or read about this elsewhere, “Anthropic shut down Fable” is the wrong framing.

For most readers this geopolitics is irrelevant — you pick a coding model on price and benchmarks. But three concrete things change in the next 30 days, and they decide whether the rest of this article is worth your time:

  • Hedge value: if your team was on Claude Fable for coding workflows and you are outside the US, GLM 5.2 is the first frontier-class coding model that is licensed (MIT, weights drop next week) for you to fork and self-host. “Open weights as political insurance” is no longer abstract.
  • Pricing pressure: open-weights frontier models put a ceiling on hosted subscription prices. Expect Anthropic, OpenAI, and Google to soften coding-plan tiers within ~60 days, regardless of whether GLM 5.2 benchmarks competitively.
  • Tooling parity: Z.ai shipped Claude Code drop-in support on day one (the dedicated /api/anthropic endpoint, covered in the Drop-in section below). The standard 2026 coding-CLI workflow no longer locks you into a single model family.

If none of those three apply, skip to the setup section. If any of them do, the rest of this article is the operational path: the 10-minute access flow you wire up once the Z.ai API opens the week of June 22, the drop-in replacement for Claude Code, and the self-host plan for when MIT weights drop the same week.

A Note on Availability (Read This First)

Zhipu’s June 13, 2026 launch was an announcement plus documentation, not a dashboard you can sign into the same day. Two access surfaces unlock on the next Z.ai release wave:

  • Z.ai Coding Plan API — opens the week of June 22, 2026. Account creation, Coding Plan tier selection, API key issuance, and the /api/anthropic + /api/coding/paas/v4 endpoints all light up in that window. Until then, the endpoint URLs in this guide are the ones published in the launch post; treat them as provisional until you can hit them.
  • MIT-licensed open weights — drop the same week under huggingface.co/zai-org/GLM-5.2. The HF repo is currently a placeholder; the config.json that confirms architecture and the BF16 / FP8 shards land on that calendar.

This guide is structured so you can do the planning work this week (pick a tier, pre-stage env vars, decide drop-in vs clean install) and execute setup in ~10 minutes the day the API turns on. If you need something working today, jump to the Alternatives section — ofox already serves DeepSeek V4 Pro / Kimi K2.6 / Qwen3 Coder Next on a single endpoint.

What You Get with GLM 5.2 (The 30-Second Answer)

ItemValue
What you can do today (June 13–21, 2026)Read this guide, pick a Coding Plan tier, pre-stage ~/.claude/settings.json or OPENAI_BASE_URL env, and queue a waitlist on z.ai if available
What you can do once API opens (week of June 22, 2026)Use GLM 5.2 inside Claude Code, Cline, OpenCode, OpenClaw, Goose, Crush, Roo Code, or Kilo Code via Z.ai’s hosted Coding Plan; self-host the MIT-licensed weights from huggingface.co/zai-org (MoE; parameter count not officially confirmed for 5.2 — likely inherits the GLM-5 line’s 744B-total / 40B-active envelope)
Time to first call once keys go live~10 minutes (sign up → API key → CLI config → smoke test)
Minimum cost~$10/mo Lite tier; ~$30/mo Pro for ~2,000 prompts/week
What you needA Z.ai account, an OpenAI-compatible coding client (or any tool that accepts a custom base_url), and 8 GB of patience for the first long-context call
What you can’t do (yet)Quote SWE-bench numbers (Zhipu didn’t publish any), get a 5-tier thinking preset (only High and Max), or pull the weights through ofox (DeepSeek V4 Pro is the closest managed analog)

Decision Frame: When GLM 5.2 Is Worth Your Setup Time

Use this section to bail out before reading the rest of the article.

When to use GLM 5.2

  • You’re running multi-file refactors on a monolith and have been bumping into the 200K context ceiling of competing coding agents
  • Your compliance team requires open, auditable model weights — MIT is one of the friendliest open-source licenses in the LLM space
  • You want a Chinese-origin coding model to hedge against US-side access restrictions — GLM 5.2 launched the day Anthropic withdrew Claude Fable 5 + Mythos 5 after a US Commerce export-control directive (Why Now above has the full timeline)

When NOT to use GLM 5.2

  • You need a model with published benchmarks before you ship it to a production team. Zhipu has not released SWE-bench, LiveCodeBench, or Aider numbers as of June 14, 2026 — independent benchmarks are days away at minimum
  • You already pay for Claude Code with Sonnet/Opus and don’t have a specific gap GLM fills. Switching costs (tool config, prompt re-tuning, eval re-runs) are not worth the ~$10/mo savings unless context-window is the actual bottleneck
  • You want a single managed endpoint that hands you GLM, GPT-5.5, and Claude Opus 4.8 behind one API key. GLM 5.2 isn’t on ofox yet (verified June 15, 2026) — if endpoint consolidation matters more than this specific model, see the Alternatives section

Stop rule

If you’ve never run into a 200K-token context limit on a real task in the last 30 days, you do not need GLM 5.2. Stop reading and revisit when Zhipu publishes a benchmark or ofox lists the model — whichever comes first.

System Requirements

Before you start the setup, confirm you have:

  • A Z.ai account with payment method on file (Coding Plan billed monthly, USD or RMB)
  • An OpenAI-compatible coding CLI — one of: Claude Code v0.x, Cline ≥ 3.x, OpenCode, Roo Code, Goose, Crush, OpenClaw, Kilo Code. Each supports a custom base_url and model-name override
  • Network egress to api.z.ai — verify with curl -I https://api.z.ai/api/paas/v4/ (you should get an HTTP response, not a connection error)
  • A side branch in your repo for the first run. Long-context coding agents are smart enough to delete unrelated files when given a vague prompt — never point one at main on your first day

If you want to self-host the weights when they drop the week of June 22, 2026, additional requirements:

  • 8x H100 80GB GPUs or equivalent — this is an estimate based on the assumption that 5.2 inherits the GLM-5 line’s ~744B-total / ~40B-active MoE shape; Zhipu has not officially confirmed 5.2 parameter counts as of June 15, 2026. Re-size once huggingface.co/zai-org/GLM-5.2/config.json lands. Expect community-built lower-VRAM forks within ~30 days of weights drop
  • vLLM or SGLang as the inference server (community examples will arrive on the HF repo; check huggingface.co/zai-org/GLM-5.2 once published)
  • Disk for the weight shards — estimate ~1.5 TB BF16 / ~860 GB FP8 if the GLM-5 lineage shape holds; treat as a planning placeholder, not a procurement number, until the HF repo confirms

Step-by-Step Setup (Hosted, ~10 Minutes — Once API Opens)

The Z.ai Coding Plan API opens the week of June 22, 2026. Steps 1–4 below execute in ~10 minutes the day the dashboard goes live; until then, you can pre-stage your CLI config (Step 3) and queue a waitlist on z.ai if available.

flowchart LR
  A[Sign up Z.ai] --> B[Pick Coding Plan tier]
  B --> C[Generate API key]
  C --> D[Configure CLI base_url + model]
  D --> E[First smoke test]
  E --> F[Wire repo, run a real task]

Step 1: Sign up for the Z.ai Coding Plan (once it opens)

Go to https://z.ai and create an account. Pick a Coding Plan tier:

TierApprox. priceQuotaBest fit
Lite~$10/mo~400 prompts/weekPersonal tinkering, light side projects
Pro~$30/mo~2,000 prompts/weekSolo dev, daily coding agent use
Max~$80/mo~8,000 prompts/weekHeavy agentic refactors, multi-hour autonomous runs
TeamSeat-basedOrg-wide pooling3+ developers sharing quota

Expected result: account dashboard with a “Coding Plan” entry showing your tier and remaining quota.

Step 2: Generate an API key

Inside the Z.ai dashboard, open API Keys → Create new key. Scope it to “Coding Plan” only — Z.ai exposes other paid endpoints (general chat, vision) that share your wallet but should not share the same key.

export ZAI_API_KEY="zai-..."

Expected result: a key starting with zai-. Drop it into your shell’s secrets file or 1Password — Z.ai shows the full key exactly once.

Step 3: Configure your coding CLI

Z.ai exposes two compat endpoints, and you pick the one that matches your client. Claude Code talks the Anthropic protocol; the other seven launch-day clients (Cline, OpenCode, Roo Code, Goose, Crush, OpenClaw, Kilo Code) speak the OpenAI chat-completions shape.

For Claude Code (Anthropic-compatible endpoint) — the minimal config is the shell-env or ~/.claude/settings.json env block covered in the Drop-in Replacement for Claude Code section below. That section also lists what carries over (CLAUDE.md, slash commands, subagents) and what changes (thinking presets, tool-result bridging) before you commit — read it before pasting the block.

For OpenAI-compatible clients (Cline, OpenCode, Roo Code, Goose, Crush, OpenClaw, Kilo Code)

export OPENAI_BASE_URL="https://api.z.ai/api/coding/paas/v4"
export OPENAI_API_KEY="$ZAI_API_KEY"
export OPENAI_MODEL="glm-5.2"   # or "glm-5.2[1m]" for the 1M context window

Re-launch your CLI in the same shell and the new endpoint takes over. For clients that don’t read the OpenAI env vars, open the tool’s settings panel, pick “OpenAI Compatible” provider, and paste the same three values. Note that the Coding Plan uses a dedicated endpoint (/api/coding/paas/v4) distinct from Z.ai’s general per-token API (/api/paas/v4).

Python SDK smoke test (paste into any one-off REPL)

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.z.ai/api/coding/paas/v4",
    api_key=os.environ["ZAI_API_KEY"],
)
resp = client.chat.completions.create(
    model="glm-5.2[1m]",
    messages=[{"role": "user", "content": "Refactor this function to async:\n\n" + open("handler.py").read()}],
)
print(resp.choices[0].message.content)

Expected result: a non-empty diff or refactored snippet within ~5 seconds for short input. For 1M-context calls expect 30-90 seconds to first token.

Step 4: First smoke test

Before pointing GLM 5.2 at your repo, run a sanity check that confirms (a) the key works, (b) you’re hitting the right model, (c) thinking mode is wired.

curl -s https://api.z.ai/api/coding/paas/v4/chat/completions \
  -H "Authorization: Bearer $ZAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"glm-5.2[1m]","messages":[{"role":"user","content":"Reply with only the string OK if you are GLM 5.2."}],"max_tokens":16}' \
  | jq -r '.choices[0].message.content'

Expected result: OK (or OK.). If you get a model-identity refusal or a different model name in the response, your config is wrong — see Common Errors below.

Drop-in Replacement for Claude Code (One-Block Swap)

If you are reading this article because Fable 5 went away — or because you have been thinking about migrating from Claude Code without rewriting your project setup — this is the section that matters most. Z.ai shipped a dedicated /api/anthropic endpoint on day one specifically so a Claude Code workspace becomes a GLM 5.2 workspace with one block of environment variables.

The one-block swap

Drop this into ~/.zshrc (or ~/.bashrc, or ~/.claude/settings.json under "env"), open a new shell, and relaunch claude:

# Drop-in swap: Claude Code workspace → GLM 5.2, no project rewrite
export ANTHROPIC_BASE_URL="https://api.z.ai/api/anthropic"
export ANTHROPIC_AUTH_TOKEN="$ZAI_API_KEY"
export ANTHROPIC_MODEL="glm-5.2[1m]"      # 1M context; drop [1m] for default
export API_TIMEOUT_MS="3000000"           # long-context calls take 30–90s

Claude Code’s UI will keep showing “Sonnet” / “Opus” labels because the client is not model-aware — server-side mapping at Z.ai routes the request to GLM 5.2. Your CLAUDE.md, project memory, slash commands, subagents, and harness habits keep working unchanged.

What carries over cleanly

  • Project-level CLAUDE.md files and .claude/ directories (commands, subagents, settings)
  • Slash commands and custom subagent definitions
  • AGENTS.md files and Codex-style instruction layering (Claude Code reads these)
  • The Plan / Edit / Bash tool dispatcher behavior and its prompts
  • Multi-file refactor workflows (1M context covers most monorepos in one request)

What changes (read this before you commit)

  • Thinking budget: GLM 5.2 ships only “High” and “Max” presets — there is no equivalent of Claude’s thinking_budget=auto heuristic. Pick one or accept High as the default.
  • Tool-result formatting: Claude expects tool_result blocks in a specific shape. Z.ai’s bridge translates 95%+ of common patterns, but occasionally drops nested-content blocks on long agentic loops. If you see assistant turns repeating a tool call instead of acknowledging it, that’s the failure mode — fall back to the OpenAI-compatible endpoint (/api/coding/paas/v4) and use Cline or OpenCode for that workflow.
  • Latency profile: first-token latency for 1M-context calls is 30–90 seconds, vs 5–15 for Claude on equivalent-size prompts. The API_TIMEOUT_MS=3000000 line above is mandatory, not optional — Claude Code defaults will kill the connection on long Plan-mode calls.
  • Quota model: you are now spending Coding Plan quota, not Claude Plan quota. The bursty agent-loop pattern that drains a Claude weekly cap in a few hours also drains a Lite-tier GLM plan; budget Pro or Max for sustained work.

Pick this path vs. a clean Cline setup

Pick the drop-in swap ifPick a clean Cline / OpenCode install if
You have 3+ slash commands, tuned subagents, or a CLAUDE.md you’ve iterated on for weeksYou are starting a new project with no Claude Code investment
Your team standardized on Claude Code’s UI and switching tools means re-onboarding engineersYour other tooling (lint, telemetry) speaks OpenAI-style requests
You want to A/B GLM 5.2 against your current Claude workflow without burning a sprint dayYou hit the tool-result bridging issue above and the workaround is more friction than a tool switch

Revert path (do this before you commit)

unset ANTHROPIC_BASE_URL ANTHROPIC_AUTH_TOKEN ANTHROPIC_MODEL and restart Claude Code. The claude CLI picks up Anthropic’s defaults again. No state inside your project is touched by the swap — it lives entirely in the shell environment.

Common Errors During Setup

ErrorLikely causeFix
401 invalid_api_keyKey scoped to wrong product or pasted with whitespaceRegenerate with “Coding Plan” scope; copy-paste through a non-stripping clipboard
model not found for glm-5.2 or glm-5.2[1m]Z.ai uses glm-5.2 for the standard context window and the [1m] suffix as a model alias that switches the request to the 1M-context configurationUse glm-5.2[1m] when you need the full 1M window; plain glm-5.2 for default-context calls. Both are valid model IDs against the Coding Plan endpoint
429 Too Many Requests after a few minutes of workLite tier quota (~400 prompts/week) burned by an agent loopUpgrade to Pro, or reduce agent iteration loops via max_iterations
Empty response body, no errorThinking budget exceeded max_tokensRaise max_tokens to ≥ 4096; thinking models stream reasoning then answer
Tool-use call returned as raw JSON in the assistant textZ.ai’s OpenAI compat doesn’t auto-parse tool_use unless tools field present in requestPass tools array even on the first turn; or use Anthropic-compatible endpoint if your client supports it
504 / timeout on multi-file refactorLong-context (>500K tokens) first-token latency can exceed default client timeoutRaise the CLI’s requestTimeoutMs to 600000 (10 min) for 1M-context calls

Team / Multi-Developer Configuration

If 3+ developers need to share a quota, the Team tier of the Coding Plan does seat-based pooling — but the setup pattern differs from solo:

  • One API key per developer, billed against the same org wallet — never share a single key across machines (it’s the fastest way to burn quota on something you can’t trace)
  • A shared .env.team checked into a private secrets repo, containing only OPENAI_BASE_URL=https://api.z.ai/api/coding/paas/v4 and OPENAI_MODEL=glm-5.2[1m] — keep API keys out of git
  • A budget guard in CI: have your coding-agent CI step abort if the per-PR completion token count exceeds N (your call — start at 200K and adjust by Friday)
  • Quota observability: Z.ai’s dashboard shows per-key usage; for programmatic polling, the Coding Plan exposes a quota endpoint at https://api.z.ai/api/monitor/usage/quota/limit covering the 5-hour token cycle, weekly quota, and monthly MCP usage — pull it into your existing observability stack (Datadog, Honeycomb)

If your org cannot route through a Chinese API endpoint (egress control, compliance), the practical pattern is to mirror the same OpenAI-compatible config against a different upstream — see Alternatives.

Advanced: The MIT Open-Weights Plan

Zhipu’s launch announcement commits the MIT-licensed weights for “next week” — meaning the week of June 22, 2026, the same window the Z.ai Coding Plan API opens. The HF org is huggingface.co/zai-org; track the GLM-5.2 repo for the actual drop.

What MIT actually buys you:

  • Commercial use, modification, redistribution — no usage caps, no per-token fees once you’re hosting it
  • Fine-tuning rights — you can train LoRAs or full fine-tunes on your own codebase and ship the result
  • Forks — if Zhipu disables a feature you depend on (or, more likely, raises prices), community forks remain operable

What MIT does not buy you:

  • A free lunch on inference compute — if 5.2 inherits the GLM-5 line’s ~744B-total / ~40B-active MoE shape (Zhipu hasn’t officially confirmed for 5.2), production throughput is still in 8x-H100 territory, with strong dependence on quantization quality
  • Future model updates — the MIT release is point-in-time; GLM 5.3 may or may not be open
  • Anthropic-quality safety tuning — Z.ai’s RLHF is its own house style, expect different refusal boundaries

The realistic path for most teams: stay on the hosted Coding Plan for the next 30-60 days, watch the community quantize the weights into 4-bit and 2-bit variants, then re-evaluate self-hosting once a single-node config exists.

Alternatives: Managed Open-Weights Coding Models on ofox

If you want a single OpenAI-compatible endpoint that already covers managed Chinese coding models — without waiting for the GLM 5.2 weights drop or building your own H100 cluster — ofox lists three solid alternatives as of June 15, 2026:

Modelofox API IDStrengthWhen to pick over GLM 5.2
DeepSeek V4 Prodeepseek/deepseek-v4-proCoding-tuned flagship, broad community track recordYou want a model with published benchmarks (DeepSeek’s are public; GLM 5.2’s aren’t yet)
Qwen3 Coder Nextbailian/qwen3-coder-nextLatest Alibaba coding-specific tier, multilingual codeYou’re shipping to a multilingual Chinese/Japanese codebase and want first-party Qwen support
Kimi K2.6moonshotai/kimi-k2.6Long-context with strong recallYou need long-context that’s verified, not “claimed but un-benchmarked”

You wire any of these in with the same config shape used for GLM 5.2 — just swap the base URL and model ID:

# Same Cline / OpenCode config, different upstream
export OPENAI_BASE_URL="https://api.ofox.ai/v1"
export OPENAI_MODEL="deepseek/deepseek-v4-pro"

This is the single-endpoint pattern: one key, many models, no per-vendor signup. See the ofox model catalog for current pricing and capability flags. When GLM 5.2 lands on ofox (it isn’t yet — verified June 15, 2026), you’ll switch by changing one string.

Watching Z.ai Status and Quota

Two things to wire in week one:

  • Z.ai status page — bookmark it the day you sign up; a new product’s first 30 days always include at least one rate-limit tuning or quota-counting bug
  • Per-PR usage instrumentation — log usage.total_tokens from every API response into your existing PR-level telemetry (Datadog, Honeycomb, your call). Coding agents drift toward burning quota on rabbit-hole refactors, and the only way you’ll catch it is at the PR level

Sources Checked for This Refresh

The thing that makes this drop different isn’t the million-token context — Anthropic and Google were already there. It’s that GLM 5.2 is the first frontier-class coding model where you can read the weights, audit the training license under MIT, and run a fork on your own metal — without giving up sub-second response times on the hosted plan while you migrate. The next 30 days will tell us whether the benchmarks back the marketing.