How much does the DeepSeek API cost?

DeepSeek V4 Flash costs $0.14/M input tokens (cache miss) and $0.28/M output tokens on the official API. Cache hits drop input to $0.0028/M — a 98% reduction. DeepSeek V4 Pro is $0.435/M input (currently 75% off until 2026/05/31) and $0.87/M output. Via ofox.ai, V4 Flash runs at $0.14/M input and $0.28/M output with no Chinese phone number required.

How do I get a DeepSeek API key?

Sign up at platform.deepseek.com with an email address, then generate a key under API Keys. Alternatively, use ofox.ai to access DeepSeek V4 Flash with a single unified key that also covers Claude, GPT, Gemini, and 50+ other models.

What is the difference between deepseek-chat and deepseek-reasoner?

As of 2026, both model IDs map to DeepSeek V4 Flash (1M context). deepseek-chat is the standard mode; deepseek-reasoner activates thinking mode with extended reasoning. Note: these legacy IDs will be deprecated in the future — migrate to deepseek-v4-flash and deepseek-v4-pro directly.

Does DeepSeek have a free tier?

The official DeepSeek API does not have a persistent free tier, but new accounts receive a small credit on signup. Some third-party providers offer rate-limited free access to older DeepSeek models.

DeepSeek V4 API Pricing 2026: Flash vs Pro Cost Guide

TL;DR: DeepSeek V4 Flash costs $0.14/M input tokens on the official API — roughly 18× cheaper than GPT-5.4 and 36× cheaper than Claude Opus 4.7 on input tokens. Prompt caching cuts that to $0.0028/M on repeated context — a 98% discount. V4 Pro costs more ($0.435/M input, currently 75% off) for the strongest reasoning performance. If you want to skip the Chinese phone number requirement and use a single API key for all your models, ofox.ai carries both models at competitive pricing.

SALE — $5 free credit for new users — Try DeepSeek V4 via ofox.ai with one OpenAI-compatible key.

What Is DeepSeek V4 Flash?

DeepSeek V4 Flash is DeepSeek’s current flagship model, offered alongside the more capable V4 Pro. Both models support a 1M token context window — the largest among mainstream API providers — and can generate up to 384K output tokens in a single response. The older model IDs deepseek-chat (standard mode) and deepseek-reasoner (thinking mode) currently map to V4 Flash, but DeepSeek has announced these legacy IDs will be deprecated in the future with no firm date.

V4 Pro adds configurable reasoning depth for complex multi-step problems, making it the choice for math, coding competitions, and research tasks where you’re willing to pay more per token for better results. V4 Flash handles the vast majority of everyday tasks — summarization, classification, chat, and routine coding — at a fraction of the cost.

Official DeepSeek API Pricing

The official DeepSeek API (platform.deepseek.com) offers two current models:

Model	Mode	Input (cache miss)	Input (cache hit)	Output	Context	Max Output
`deepseek-v4-flash`	Standard + Thinking	$0.14/M	$0.0028/M	$0.28/M	1M	384K
`deepseek-v4-pro`	Advanced reasoning	$0.435/M	$0.003625/M	$0.87/M	1M	384K

V4 Pro pricing reflects a 75% discount promotion extended until 2026/05/31 15:59 UTC. Without the discount, V4 Pro would cost $1.74/M input and $3.48/M output.

The 98% cache discount on V4 Flash is automatic: DeepSeek’s API applies it whenever a prefix of your prompt matches a previously cached version. You don’t need to configure anything. On V4 Pro, cache hits save even more in absolute dollars.

How to Get a DeepSeek API Key

Sign up at platform.deepseek.com with an email address. No Chinese phone number is required for the API platform (unlike the consumer app). Once logged in, go to API Keys and generate a key.

The base URL is https://api.deepseek.com and the API is OpenAI-compatible, so you can drop it into any OpenAI SDK call by changing base_url and api_key.

One practical issue: DeepSeek’s platform has had intermittent availability problems during high-demand periods. If you’re building something production-critical, it’s worth having a fallback.

Access DeepSeek via ofox

ofox.ai carries DeepSeek V4 Flash as deepseek/deepseek-v4-flash at $0.14/M input and $0.28/M output — matching the official API pricing. V4 Pro is also available as deepseek/deepseek-v4-pro at $1.74/M input and $3.48/M output. The current model IDs and live prices are kept on the DeepSeek model page. The practical advantage is a single API key that also covers Claude Opus 4.7, GPT-5.4, Gemini 3.1 Pro, Qwen3, and 50+ other models. No separate accounts, no separate billing.

This matters if you’re routing between models based on task type or cost — something covered in more depth in the AI API aggregation guide.

Setup is three lines:

from openai import OpenAI

client = OpenAI(api_key="sk-xxx", base_url="https://api.ofox.ai/v1")
response = client.chat.completions.create(
    model="deepseek/deepseek-v4-flash",
    messages=[{"role": "user", "content": "Explain the 1M token context window in one paragraph"}]
)

If you’re migrating from the OpenAI SDK, the migration guide covers the full swap in under 10 minutes.

How to Cut Your DeepSeek API Costs

The 98% cache discount on input tokens is the biggest lever available. Structure prompts so the system prompt and any static context come first — those are the parts most likely to be cached. A 2,000-token system prompt that gets cached on every call saves $0.2744 per million requests at V4 Flash pricing. You don’t configure anything; the API applies it automatically when a prefix matches.

Beyond caching: only use deepseek-v4-pro when you actually need advanced multi-step reasoning. For classification, summarization, or simple generation, V4 Flash is faster and significantly cheaper per token.

Context trimming matters too. The 1M window is massive, but you pay for every input token. RAG that pulls only the relevant chunks is cheaper than stuffing the full document — the embedding + RAG guide covers that pattern.

For offline jobs (document processing, dataset annotation, bulk summarization), parallelize with async calls during off-peak hours. The official API doesn’t have a formal batch endpoint, but async Python or Node handles this fine.

For a broader cost-reduction playbook across all models, see how to reduce AI API costs.

DeepSeek vs. Alternatives: Price Comparison

At $0.14/M input and $0.28/M output, DeepSeek V4 Flash is hard to beat for general-purpose tasks. Here’s how it stacks up against models in the same capability tier (all prices via ofox.ai for consistency):

Model	Input	Output
DeepSeek V4 Flash	$0.14/M	$0.28/M
Qwen3 Max	$0.36/M	$1.43/M
GPT-5.4	$2.50/M	$15/M
Claude Sonnet 4.6	$3/M	$15/M
Claude Opus 4.7	$5/M	$25/M

DeepSeek V4 Flash is roughly 18× cheaper than GPT-5.4 on input and 54× cheaper on output. For high-volume workloads where you’re not specifically tied to OpenAI or Anthropic’s ecosystem, that gap is significant.

The tradeoff: DeepSeek’s API has had reliability issues during peak periods, and the model’s English instruction-following can lag behind Claude Sonnet 4.6 on nuanced tasks. For a full comparison, see the model comparison guide.

Practical Limits to Know

A few things that will bite you if you don’t know them upfront:

Rate limits vary by account tier. New accounts start conservative — expect to hit them if you’re testing at volume.
deepseek-reasoner supports streaming, but thinking tokens come back separately from the final answer. If you’re parsing the stream, check the docs for the format.
Function calling / tool use works in both deepseek-v4-flash and deepseek-v4-pro. Check the official pricing page for the latest supported features.
The 1M token context is input-only. Max output is 384K tokens across both V4 Flash and V4 Pro.
The legacy deepseek-chat and deepseek-reasoner model IDs will be deprecated in the future (no firm date yet). Migrate to deepseek-v4-flash and deepseek-v4-pro directly.

The Bottom Line

At $0.14/M input with a 98% cache discount on repeated context, DeepSeek V4 Flash is the obvious pick for high-volume workloads where you don’t need Claude’s instruction-following or GPT’s ecosystem integrations. For a detailed real-world cost analysis using DeepSeek V4 as a Claude Code backend (including cache hit realism and quality trade-offs), see our DeepSeek V4 in Claude Code cost breakdown. Get started at platform.deepseek.com, or use ofox.ai if you want a single key that covers DeepSeek alongside Claude, GPT, and Gemini.

What Is DeepSeek V4 Flash?

Official DeepSeek API Pricing

How to Get a DeepSeek API Key

Access DeepSeek via ofox

How to Cut Your DeepSeek API Costs

DeepSeek vs. Alternatives: Price Comparison

Practical Limits to Know

The Bottom Line

Related Articles

Qwen3 API Guide: Access Qwen3-Max, Qwen3-Coder, and Qwen3.5 Models (2026)

Best AI Models June 2026 — Top 10 by SWE-Bench, Pricing & Context

DeepSeek V3.2 Prompt Caching on ofox: 10-Min Setup, 80% Savings (2026)