What the DeepSeek Roadmap Looks Like After the V4 Preview
If you maintain a production integration with `deepseek-chat` or `deepseek-reasoner`, the DeepSeek roadmap stopped being abstract on April 24, 2026 — the day V4 Preview shipped and started a 90-day countdown on the legacy IDs you’re probably still calling. This guide is for engineers and technical buyers who need to know what changed, what’s still coming, and what specific dates and prices belong on next sprint’s planning board. I run V4-Pro and V4-Flash in production today and ran V3, V3.2 and R1 before that; the timeline below is built from DeepSeek’s own announcements, the V4 technical report, and reporting from Reuters, Bloomberg, TechCrunch and CNBC. By the end you’ll have a clean migration plan, a working cost model for both V4 tiers, and a sober view of where DeepSeek goes next.
The short version of the DeepSeek roadmap
DeepSeek’s current generation is V4 Preview, released as two open-weight Mixture-of-Experts models on April 24, 2026: deepseek-v4-pro (1.6T total / 49B active) and deepseek-v4-flash (284B / 13B active). Both ship under the MIT license and default to a 1,000,000-token context window with up to 384,000 output tokens. DeepSeek-V4 Preview is officially live and open-sourced, with V4-Pro at 1.6T total / 49B active params and V4-Flash at 284B total / 13B active params, marketed as the “era of cost-effective 1M context length.”
The big roadmap items, in plain English:
- April 24, 2026 — V4 Preview ships; web chat switches to V4 by default.
- July 24, 2026, 15:59 UTC — legacy
deepseek-chatanddeepseek-reasonerIDs are fully retired. After that timestamp the legacy IDs become inaccessible — they currently route to deepseek-v4-flash in non-thinking and thinking modes respectively. - Off-peak discounts ended on September 5, 2025 and have not returned with V4.
- V4 stable + future tiers — Preview implies a stable release will follow; DeepSeek has not committed to a date.
What “V4 Preview” actually means for the roadmap
“Preview” is a deliberate label. It means DeepSeek considers V4 production-ready enough to publish weights and route paid API traffic, but reserves the right to revise pricing and behaviour before the stable release. The release is still labeled a preview, and DeepSeek has not published final public API prices alongside the preview release. Anchor every quoted price in your internal planning to a date and a link to the official pricing page.
The V4 family also closes a long architectural arc. DeepSeek’s previous release was V3.2 (and V3.2 Speciale) in December; V4 ships as two preview models — V4-Pro (1.6T total, 49B active) and V4-Flash (284B total, 13B active) — both 1M-token-context Mixture-of-Experts under the MIT license. The lab’s framing is that million-token context is now an efficiency problem, not a capability problem. DeepSeek published the V4 Preview on April 24, 2026 — a new open-weight Mixture-of-Experts series whose thesis is that million-token context processing is not a capability problem anymore, it’s an efficiency problem.
Migration: legacy IDs to V4 model IDs
Anyone with a live integration should treat the migration as a one-line change made deliberately, not under pressure on July 23. Chat requests hit POST /chat/completions, the OpenAI-compatible endpoint, at https://api.deepseek.com. Only the model field needs to change. DeepSeek’s own guidance: keep base_url, just update model to deepseek-v4-pro or deepseek-v4-flash; the API supports OpenAI ChatCompletions and Anthropic surfaces; both models support 1M context and dual modes (Thinking / Non-Thinking).
Mapping table
| Legacy ID | Routes to (until 2026-07-24) | Recommended replacement |
|---|---|---|
deepseek-chat |
deepseek-v4-flash (non-thinking) |
deepseek-v4-flash with no reasoning_effort |
deepseek-reasoner |
deepseek-v4-flash (thinking) |
deepseek-v4-flash + reasoning_effort="high" |
| — | — | deepseek-v4-pro for frontier-tier agentic / coding work |
A minimal Python migration using the OpenAI SDK, with thinking mode on V4-Pro:
from openai import OpenAI
client = OpenAI(base_url="https://api.deepseek.com", api_key="...")
resp = client.chat.completions.create(
model="deepseek-v4-pro",
messages=[{"role": "user", "content": "Plan the migration."}],
reasoning_effort="high",
extra_body={"thinking": {"type": "enabled"}},
)
Two things to remember on every request: the API is stateless, so your client must resend the conversation history every call (the web chat keeps state for you; the API does not), and thinking mode returns reasoning_content alongside the final content. If you need a step-by-step walkthrough, the DeepSeek API getting started guide covers SDK setup end to end.
Pricing on the roadmap: where the rates go from here
V4-Flash undercut the V3.2 rates it replaced; V4-Pro introduces a higher tier for frontier agentic work. The rate cards as of April 2026 (verify on the DeepSeek API pricing page before committing budget):
| Model | Cache hit ($/1M in) | Cache miss ($/1M in) | Output ($/1M) |
|---|---|---|---|
deepseek-v4-flash |
$0.028 | $0.14 | $0.28 |
deepseek-v4-pro |
$0.145 | $1.74 | $3.48 |
TechCrunch noted V4 Flash costs $0.14 per million input tokens and $0.28 per million output tokens, undercutting GPT-5.4 Nano, Gemini 3.1 Flash, GPT-5.4 Mini and Claude Haiku 4.5; V4 Pro at $0.145 / $3.48 also undercuts Gemini 3.1 Pro, GPT-5.5, Claude Opus 4.7 and GPT-5.4. That makes DeepSeek among the cheapest frontier-tier APIs in April 2026, though “cheapest” is a moving target — confirm against each provider’s current pricing page before signing a contract.
Worked cost example: V4-Flash, 1M calls
Workload: 1,000,000 API calls with a 2,000-token system prompt (cached after the first call), a 200-token user message (cache miss against the cached prefix on every call), and a 300-token response. Using deepseek-v4-flash:
- Cached input: 2,000 × 1,000,000 = 2,000,000,000 tokens × $0.028/M = $56.00
- Uncached input: 200 × 1,000,000 = 200,000,000 tokens × $0.14/M = $28.00
- Output: 300 × 1,000,000 = 300,000,000 tokens × $0.28/M = $84.00
- Total: $168.00
Same workload on deepseek-v4-pro: $290 + $348 + $1,044 = $1,682.00. Roughly 10× the spend; reserve V4-Pro for work where the benchmark lift justifies it. The DeepSeek pricing calculator handles the arithmetic if your shapes differ.
What V4 is good at — and where it isn’t
DeepSeek’s own positioning in the V4 announcement is unusually candid. V4-Pro-Max demonstrates superior performance relative to GPT-5.2 and Gemini-3.0-Pro on standard reasoning benchmarks, but performance falls marginally short of GPT-5.4 and Gemini-3.1-Pro, suggesting a developmental trajectory that trails the leading frontier models by approximately 3 to 6 months. That gap is the main thing to watch on the roadmap.
Where V4 already wins:
- Agentic coding — DeepSeek touted top-tier performance in coding benchmarks and big advancements in reasoning and agentic tasks.
- Open-model knowledge — “In world knowledge benchmarks, DeepSeek V4-Pro significantly leads other open source models and is only slightly outperformed by the top-tier closed-source model Gemini-3.1-Pro,” DeepSeek said.
- Long-context efficiency — V4-Pro reportedly uses ~27% of single-token FLOPs and ~10% of the KV cache of V3.2; V4-Flash drops those to ~10% and ~7%.
Where V4 lags:
- Multimodality — Both V4 Flash and V4 Pro support text only, unlike many of its closed-source peers, which offer support for understanding and generating audio, video and images. Vision and audio remain in separate model families like DeepSeek VL2.
- Closed frontier knowledge tests — the 3–6-month gap to GPT-5.4 / Gemini 3.1 Pro is real.
Hardware and supply-chain track of the roadmap
V4 is the first DeepSeek generation visibly trained and served on Chinese silicon. DeepSeek partnered with Huawei, whose “Supernode” technology combines Ascend 950 chips for compute, and Counterpoint’s Wei Sun highlighted that V4 runs on domestic chips from Huawei and Cambricon, in contrast to R1 which was trained on Nvidia hardware. That changes the roadmap risk profile in two directions: less exposure to US export controls, but more dependence on a domestic supply chain that is still maturing.
Reasoning-effort modes: a parameter, not a model ID
One of the cleanest changes in V4 is that thinking mode is no longer a separate ID. Both deepseek-v4-pro and deepseek-v4-flash accept three settings:
- Non-thinking (default) — fastest and cheapest; supports FIM completion (Beta).
- Thinking —
reasoning_effort="high"withextra_body={"thinking": {"type": "enabled"}}. - Thinking-max —
reasoning_effort="max"; needsmax_model_len >= 393216to avoid truncation.
Other parameters worth pinning in your client wrappers: temperature (DeepSeek recommends 0.0 for code/math, 1.0 for data analysis, 1.3 for general chat and translation, 1.5 for creative writing), top_p, max_tokens, JSON mode (response_format={"type": "json_object"}, designed to return valid JSON but not guaranteed — include the word “json” plus a small example schema and set max_tokens high), tool calling, streaming, and context caching.
What’s next on the roadmap (and what isn’t confirmed)
DeepSeek has not published a dated plan for V4 stable, V4 Speciale, or V5. What we do know:
- V4 stable release — implied by the Preview label; no public date.
- Additional V4 variants — Reuters citing The Information reported V4 has two additional variants in development. Past form (V3 → V3.1 → V3.2 → V3.2-Speciale) suggests incremental Speciale or coding-tuned tiers.
- Multimodality — text-only at launch; expect work to land in the VL line first.
- AGI framing — DeepSeek says it remains committed to longtermism, advancing steadily toward its ultimate goal of AGI. Treat as direction, not a deliverable.
Practical next steps for teams
- Audit your model strings. Grep your codebase for
deepseek-chatanddeepseek-reasoner. Anything still using them needs updating before July 24, 2026, 15:59 UTC. - Pick a tier per workload. Default to V4-Flash; promote to V4-Pro only where benchmarks justify ~10× output cost.
- Re-test prompts. Reasoning-effort behaviour shifted in V4; prompts tuned for V3.2 thinking may need adjustment.
- Reset your cost model. Off-peak discounts are gone; the cache-hit/miss split is now where savings live. The DeepSeek context caching guide explains how to structure prompts to hit it.
- Compare honestly. If you’re benchmarking against Anthropic or OpenAI, the DeepSeek vs Claude and DeepSeek vs ChatGPT comparisons cover the trade-offs.
For background reading on prior generations and how the lab got here, see DeepSeek history and the broader DeepSeek beginner guides.
Last verified: 2026-04-25. DeepSeek AI Guide is an independent resource and is not affiliated with DeepSeek or its parent company. Model IDs, pricing and API behaviour change; check the official DeepSeek documentation and pricing page before committing to a production decision.
When was DeepSeek V4 released and what’s on the roadmap next?
V4 Preview launched on April 24, 2026 with two open-weight MoE models, V4-Pro and V4-Flash. DeepSeek has not announced a stable release date or named V5 publicly, but the Preview label implies a stable version is queued, and Reuters has reported that additional V4 variants are in development. Track updates on the DeepSeek V4 release date page for confirmed milestones.
What happens to deepseek-chat and deepseek-reasoner after the migration window?
Both legacy IDs stop working at 15:59 UTC on July 24, 2026. Until then they route to deepseek-v4-flash in non-thinking and thinking modes respectively. Migration is a single-line change to the model field — base_url and api_key stay the same. The DeepSeek OpenAI SDK compatibility guide shows the exact swap.
How does V4 pricing compare to V3.2 on the same workload?
V4-Flash undercuts V3.2 across the board: $0.14 cache-miss input and $0.28 output per 1M tokens versus V3.2’s retired $0.28 / $0.42. V4-Pro introduces a frontier tier at $1.74 / $3.48. Off-peak discounts ended September 5, 2025 and were not reintroduced. Always cost out the cache-hit, cache-miss and output buckets separately — the DeepSeek cost estimator handles all three.
Is DeepSeek V4 fully open source?
Both V4-Pro and V4-Flash publish weights and code under MIT, the same licence used for V3.2, V3.1 and R1. Earlier releases (V3 base, Coder-V2, VL2) split MIT code from a separate DeepSeek Model License for weights, so check the specific Hugging Face repo if licensing matters. The is DeepSeek open source guide breaks this down model by model.
Can DeepSeek V4 replace GPT-5 or Claude Opus for my use case?
It depends on the workload. DeepSeek’s own announcement says V4-Pro trails GPT-5.4 and Gemini-3.1-Pro by roughly 3–6 months on frontier knowledge benchmarks, while undercutting them on price and matching them on agentic coding. For text-only reasoning and code, V4 is competitive; for multimodal work it is not. The DeepSeek vs OpenAI o1 comparison covers a head-to-head.
