DeepSeek V4 API Pricing Guide 2026: Flash vs Pro Cost Comparison, Cache Hit Math & Setup
TL;DR: DeepSeek V4 Flash costs $0.14/M input tokens on the official API — roughly 18× cheaper than GPT-5.4 and 36× cheaper than Claude Opus 4.7 on input tokens. Prompt caching cuts that to $0.0028/M on repeated context — a 98% discount. V4 Pro costs more ($0.435/M input, currently 75% off) for the strongest reasoning performance. If you want to skip the Chinese phone number requirement and use a single API key for all your models, ofox.ai carries both models at competitive pricing.
What Is DeepSeek V4 Flash?
DeepSeek V4 Flash is DeepSeek’s current flagship model, offered alongside the more capable V4 Pro. Both models support a 1M token context window — the largest among mainstream API providers — and can generate up to 384K output tokens in a single response. The older model IDs deepseek-chat (standard mode) and deepseek-reasoner (thinking mode) currently map to V4 Flash, but DeepSeek has announced these legacy IDs will be deprecated in the future with no firm date.
V4 Pro adds configurable reasoning depth for complex multi-step problems, making it the choice for math, coding competitions, and research tasks where you’re willing to pay more per token for better results. V4 Flash handles the vast majority of everyday tasks — summarization, classification, chat, and routine coding — at a fraction of the cost.
Official DeepSeek API Pricing
The official DeepSeek API (platform.deepseek.com) offers two current models:
| Model | Mode | Input (cache miss) | Input (cache hit) | Output | Context | Max Output |
|---|---|---|---|---|---|---|
deepseek-v4-flash | Standard + Thinking | $0.14/M | $0.0028/M | $0.28/M | 1M | 384K |
deepseek-v4-pro | Advanced reasoning | $0.435/M | $0.003625/M | $0.87/M | 1M | 384K |
V4 Pro pricing reflects a 75% discount promotion extended until 2026/05/31 15:59 UTC. Without the discount, V4 Pro would cost $1.74/M input and $3.48/M output.
The 98% cache discount on V4 Flash is automatic: DeepSeek’s API applies it whenever a prefix of your prompt matches a previously cached version. You don’t need to configure anything. On V4 Pro, cache hits save even more in absolute dollars.
How to Get a DeepSeek API Key
Sign up at platform.deepseek.com with an email address. No Chinese phone number is required for the API platform (unlike the consumer app). Once logged in, go to API Keys and generate a key.
The base URL is https://api.deepseek.com and the API is OpenAI-compatible, so you can drop it into any OpenAI SDK call by changing base_url and api_key.
One practical issue: DeepSeek’s platform has had intermittent availability problems during high-demand periods. If you’re building something production-critical, it’s worth having a fallback.
Access DeepSeek via ofox
ofox.ai carries DeepSeek V4 Flash as deepseek/deepseek-v4-flash at $0.14/M input and $0.28/M output — matching the official API pricing. V4 Pro is also available as deepseek/deepseek-v4-pro at $1.74/M input and $3.48/M output. The practical advantage is a single API key that also covers Claude Opus 4.7, GPT-5.4, Gemini 3.1 Pro, Qwen3, and 50+ other models. No separate accounts, no separate billing.
This matters if you’re routing between models based on task type or cost — something covered in more depth in the AI API aggregation guide.
Setup is three lines:
from openai import OpenAI
client = OpenAI(api_key="sk-xxx", base_url="https://api.ofox.ai/v1")
response = client.chat.completions.create(
model="deepseek/deepseek-v4-flash",
messages=[{"role": "user", "content": "Explain the 1M token context window in one paragraph"}]
)
If you’re migrating from the OpenAI SDK, the migration guide covers the full swap in under 10 minutes.
How to Cut Your DeepSeek API Costs
The 98% cache discount on input tokens is the biggest lever available. Structure prompts so the system prompt and any static context come first — those are the parts most likely to be cached. A 2,000-token system prompt that gets cached on every call saves $0.2744 per million requests at V4 Flash pricing. You don’t configure anything; the API applies it automatically when a prefix matches.
Beyond caching: only use deepseek-v4-pro when you actually need advanced multi-step reasoning. For classification, summarization, or simple generation, V4 Flash is faster and significantly cheaper per token.
Context trimming matters too. The 1M window is massive, but you pay for every input token. RAG that pulls only the relevant chunks is cheaper than stuffing the full document — the embedding + RAG guide covers that pattern.
For offline jobs (document processing, dataset annotation, bulk summarization), parallelize with async calls during off-peak hours. The official API doesn’t have a formal batch endpoint, but async Python or Node handles this fine.
For a broader cost-reduction playbook across all models, see how to reduce AI API costs.
DeepSeek vs. Alternatives: Price Comparison
At $0.14/M input and $0.28/M output, DeepSeek V4 Flash is hard to beat for general-purpose tasks. Here’s how it stacks up against models in the same capability tier (all prices via ofox.ai for consistency):
| Model | Input | Output |
|---|---|---|
| DeepSeek V4 Flash | $0.14/M | $0.28/M |
| Qwen3 Max | $0.36/M | $1.43/M |
| GPT-5.4 | $2.50/M | $15/M |
| Claude Sonnet 4.6 | $3/M | $15/M |
| Claude Opus 4.7 | $5/M | $25/M |
DeepSeek V4 Flash is roughly 18× cheaper than GPT-5.4 on input and 54× cheaper on output. For high-volume workloads where you’re not specifically tied to OpenAI or Anthropic’s ecosystem, that gap is significant.
The tradeoff: DeepSeek’s API has had reliability issues during peak periods, and the model’s English instruction-following can lag behind Claude Sonnet 4.6 on nuanced tasks. For a full comparison, see the model comparison guide.
Practical Limits to Know
A few things that will bite you if you don’t know them upfront:
- Rate limits vary by account tier. New accounts start conservative — expect to hit them if you’re testing at volume.
deepseek-reasonersupports streaming, but thinking tokens come back separately from the final answer. If you’re parsing the stream, check the docs for the format.- Function calling / tool use works in both
deepseek-v4-flashanddeepseek-v4-pro. Check the official pricing page for the latest supported features. - The 1M token context is input-only. Max output is 384K tokens across both V4 Flash and V4 Pro.
- The legacy
deepseek-chatanddeepseek-reasonermodel IDs will be deprecated in the future (no firm date yet). Migrate todeepseek-v4-flashanddeepseek-v4-prodirectly.
The Bottom Line
At $0.14/M input with a 98% cache discount on repeated context, DeepSeek V4 Flash is the obvious pick for high-volume workloads where you don’t need Claude’s instruction-following or GPT’s ecosystem integrations. For a detailed real-world cost analysis using DeepSeek V4 as a Claude Code backend (including cache hit realism and quality trade-offs), see our DeepSeek V4 in Claude Code cost breakdown. Get started at platform.deepseek.com, or use ofox.ai if you want a single key that covers DeepSeek alongside Claude, GPT, and Gemini.


