DeepSeek R1 explained: benchmarks, pricing, and the V4 migration

Models·April 24, 2026·By DS Guide Editorial

If you landed here because a tutorial, a news clip, or a Reddit thread told you DeepSeek R1 is the open-weight reasoning model that rattled markets in January 2025, this page is the practitioner’s version of that story. I ran R1 in production for most of 2025 alongside ChatGPT and Claude, and I have since migrated those workloads to DeepSeek V4. What follows is what R1 actually is, what its benchmarks say (with the exact versions cited in DeepSeek’s own technical report), where it still holds up, and — critically — how its `deepseek-reasoner` API ID now behaves as the V4 retirement window closes on July 24, 2026. You will get architecture, numbers, code, pricing, and an honest verdict.

What DeepSeek R1 is, in one paragraph

DeepSeek R1 is a reasoning-focused large language model released by Hangzhou-based DeepSeek in January 2025. It was the first open-weight model to publicly match OpenAI’s o1-1217 on a cluster of math, coding, and knowledge benchmarks, and its release is what triggered the January 2025 tech-stock selloff. DeepSeek introduced R1-Zero, trained via large-scale reinforcement learning without supervised fine-tuning as a preliminary step, which demonstrated remarkable performance on reasoning but encountered challenges such as endless repetition, poor readability, and language mixing; DeepSeek R1 was then introduced to incorporate cold-start data before RL, and it achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. In practical terms, R1 is the model that made a credible case for open-weight reasoning at a fraction of closed-source pricing.

Architecture and lineage

R1 is a 671B-parameter Mixture-of-Experts model with 37B active parameters per token, built on top of DeepSeek-V3-Base. DeepSeek-R1-Zero and DeepSeek-R1 are trained based on DeepSeek-V3-Base; for more details regarding the model architecture, see the DeepSeek-V3 repository. Context length is 128,000 tokens, with generation capped at 32,768. For all the R1 series models, the maximum generation length is set to 32,768 tokens; for benchmarks requiring sampling, a temperature of 0.6 and a top-p of 0.95 are used, generating 64 responses per query to estimate pass@1.

There are two R1 releases worth distinguishing:

DeepSeek R1 (January 2025) — the original release.
DeepSeek R1-0528 (May 2025) — a point upgrade. Compared to previous versions of DeepSeek-R1, the usage recommendations for DeepSeek-R1-0528 include that system prompt is supported now. AIME 2025 moved 70.0 → 87.5, GPQA 71.5 → 81.0, LCB_v6 63.5 → 73.3, and Aider 57.0 → 71.6.

R1 sits upstream of a family of smaller distilled checkpoints trained on 800k reasoning samples generated by R1 itself — see the dedicated write-up for DeepSeek R1 Distill if you need a model that runs on a single GPU.

Benchmarks (from the R1 technical report)

Every number below is from DeepSeek’s R1 technical report (arXiv 2501.12948) or the follow-up 0528 changelog. I have not blended them with later-generation claims.

Benchmark	DeepSeek R1	OpenAI o1-1217	DeepSeek R1-0528 (May 2025)
MMLU	90.8%	91.8%	—
MMLU-Pro	84.0%	—	—
GPQA Diamond	71.5%	75.7%	81.0%
MATH-500	97.3%	97.3% (par)	—
AIME 2024 (Pass@1)	79.8%	79.2%	~91.4%
AIME 2025	70.0%	—	87.5%
Codeforces Elo	2,029	—	~1,930

The headline claims are well-sourced. DeepSeek-R1 achieves 79.8% Pass@1 on AIME 2024, slightly surpassing OpenAI-o1-1217, and on MATH-500 it attains 97.3%, performing on par with OpenAI-o1-1217 and significantly outperforming other models. On MMLU, MMLU-Pro, and GPQA Diamond, DeepSeek-R1 scores 90.8%, 84.0%, and 71.5%; performance sits slightly below OpenAI-o1-1217 on these benchmarks but surpasses other closed-source models. On coding, DeepSeek-R1 achieves a 2,029 Elo rating on Codeforces, outperforming 96.3% of human participants.

For a current-generation comparison, head to DeepSeek R1 vs OpenAI o1.

Training cost — what the numbers actually cover

The $294,000 figure that trended in September 2025 is real, but narrowly scoped. The Nature article, co-authored by Liang Wenfeng, said DeepSeek’s reasoning-focused R1 model cost $294,000 to train and used 512 Nvidia H800 chips. After the preparatory phase, R1 was trained for a total of 80 hours on the 512-chip cluster of H800 chips.

The number excludes the base-model work. The $294,000 price tag does not include the $6 million DeepSeek reported spending to build the general-purpose large language model that R1 is based on. Reporting from The Register pushed back on the headline framing, arguing that since you cannot have R1 without first building V3, the actual cost of the model was closer to $5.87 million. The honest framing: R1’s incremental RL cost was low; the full stack, including V3 pre-training, was in the multi-million range.

Strengths — where R1 still earns its keep

Math and Olympiad-style reasoning — MATH-500 at 97.3% and AIME 2024 at 79.8% remain strong open-weight numbers.
Competitive coding — a 2,029 Codeforces Elo is above most non-reasoning models.
Licensing clarity — the R1 code repository is licensed under the MIT License, use of the DeepSeek-R1 models is also subject to the MIT License, and the DeepSeek-R1 series supports commercial use and distillation. That is cleaner than most 2024-era open releases.
Distillation seed — R1’s reasoning traces have been used to post-train smaller open models; see the R1 Distill page for details on the Qwen and Llama variants.

Weaknesses — where R1 falls short in 2026

Superseded on the API. As of April 24, 2026, DeepSeek V4 is the current generation. The legacy `deepseek-reasoner` ID still accepts traffic but now routes to `deepseek-v4-flash` in thinking mode, and it will be fully retired on July 24, 2026 at 15:59 UTC.
No native tool calling in the original R1. That capability landed with R1-0528 and is now standard on V4.
Chatty reasoning traces. R1 burns tokens on long deliberations; for short-turn workloads the non-thinking V4-Flash default is cheaper per useful answer.
GPQA Diamond lag vs o1-1217 (71.5% vs 75.7% in the original release).

How to access R1 today

Web chat and mobile app

The DeepSeek web chat and mobile app have already defaulted to V4. The DeepThink toggle — the control that originally switched to R1 — now switches V4 between non-thinking and thinking mode. You can chat with DeepSeek-R1 on DeepSeek’s official website at chat.deepseek.com and switch on the “DeepThink” button, and DeepSeek also provides an OpenAI-compatible API at platform.deepseek.com. Remember that the web and app surfaces maintain session history for you; the API does not.

Open weights on Hugging Face

R1 weights are hosted at deepseek-ai/DeepSeek-R1 and deepseek-ai/DeepSeek-R1-0528. If you want to run R1 yourself, our local installation walkthrough covers the quantised builds that fit on a single H100 or a pair of consumer GPUs.

API access during the migration window

Chat requests hit POST /chat/completions, the OpenAI-compatible endpoint on DeepSeek’s API. The base URL is https://api.deepseek.com, and the legacy `deepseek-reasoner` ID still resolves — but it now routes to `deepseek-v4-flash` in thinking mode until the retirement cutoff. A minimal Python example using the OpenAI SDK, shown in thinking mode on V4 (the recommended pattern for new code):

from openai import OpenAI

client = OpenAI(
    base_url="https://api.deepseek.com",
    api_key="YOUR_KEY",
)

resp = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "user", "content": "Prove the infinitude of primes."}
    ],
    reasoning_effort="high",
    extra_body={"thinking": {"type": "enabled"}},
)

print(resp.choices[0].message.reasoning_content)
print(resp.choices[0].message.content)

When thinking is enabled, V4 returns reasoning_content alongside the final content — the same response shape the legacy deepseek-reasoner ID used to produce. The API itself is stateless: your client must resend the full messages array on every turn. Relevant parameters: temperature, top_p, max_tokens (up to 384,000 on V4), reasoning_effort, plus JSON mode, streaming, tool calling, and context caching. Migration is a one-line model= swap; base_url does not change. See the DeepSeek API documentation for the full parameter reference.

Pricing snapshot (as of April 2026)

R1’s original published rates — $0.55 per million input tokens and $2.19 per million output tokens — are no longer in effect. Traffic sent to `deepseek-reasoner` now bills at the `deepseek-v4-flash` rates. The current published V4-Flash rates, which apply to any request you route through the legacy R1 ID until July 24, 2026:

Metric	V4-Flash (USD per 1M tokens)
Input, cache hit	$0.028
Input, cache miss	$0.14
Output	$0.28

Off-peak discounts ended on September 5, 2025 and have not been reintroduced. Verify the live rates on the DeepSeek API pricing page before any capacity planning.

Worked cost example on V4-Flash

1,000,000 calls, 2,000-token cached system prompt, 200-token uncached user message, 300-token response:

Cached input: 2,000,000,000 tokens × $0.028/M = $56.00
Uncached input: 200,000,000 tokens × $0.14/M = $28.00
Output: 300,000,000 tokens × $0.28/M = $84.00
Total: $168.00

Thinking-mode responses tend to be longer, so budget 3–5× the output-token count you would expect from a non-thinking call. Use the DeepSeek pricing calculator for your own workload.

Best use cases for R1-class reasoning

Mathematical and scientific work — see DeepSeek for math.
Academic research and literature review — DeepSeek for research.
Code generation and review — DeepSeek for coding.

Comparable alternatives

If you are still evaluating whether R1-class reasoning is the right pick, two head-to-heads will save you time: DeepSeek vs OpenAI o1 and DeepSeek vs Claude. For the full field, the DeepSeek alternatives hub lists open and closed-source options by task. And for sibling models in the same family, the DeepSeek models hub has every release in one place, including DeepSeek V4.

Verdict

DeepSeek R1 is a landmark open-weight reasoning model and, through the R1-0528 upgrade, still a credible self-hosted choice. For API workloads, however, the right move in April 2026 is to migrate to deepseek-v4-flash or deepseek-v4-pro with reasoning_effort="high" and delete the legacy ID from your code before July 24, 2026. R1 earned its place in the history of this field; V4 is where the bills get paid.

Last verified: 2026-04-24. DeepSeek AI Guide is an independent resource and is not affiliated with DeepSeek or its parent company. Model IDs, pricing and API behaviour change; check the official DeepSeek documentation and pricing page before committing to a production decision.

Frequently asked questions

Is DeepSeek R1 still available in 2026?

Yes, with a caveat. R1 weights remain open and downloadable under MIT, so you can self-host indefinitely. On DeepSeek’s managed API, the legacy deepseek-reasoner ID still accepts requests but now routes to deepseek-v4-flash in thinking mode, and it is scheduled for full retirement on July 24, 2026 at 15:59 UTC. Migration is a one-line model= swap — see our API documentation guide.

How does DeepSeek R1 compare to OpenAI o1?

On DeepSeek’s own R1 technical report, R1 scored 90.8% on MMLU vs o1-1217’s 91.8%, matched o1-1217 at 97.3% on MATH-500, and edged it on AIME 2024 at 79.8% vs 79.2%. o1-1217 led on GPQA Diamond at 75.7% to R1’s 71.5%. The practical differentiator was pricing and open weights, not raw capability. Full breakdown on our R1 vs o1 comparison.

What did DeepSeek R1 actually cost to train?

DeepSeek’s September 2025 Nature paper reported $294,000 for the R1 reinforcement-learning phase, using 512 Nvidia H800 GPUs for 80 hours. That figure excludes the roughly $6 million spent training the DeepSeek-V3 base model R1 is built on, so the realistic total is closer to $5.87 million. See the DeepSeek research papers roundup for the source documents.

Can I run DeepSeek R1 locally?

Yes, if you have the hardware. The full 671B MoE checkpoint needs a multi-GPU server, but quantised builds (4-bit GGUF, AWQ) run on a single H100 or a pair of consumer GPUs with reduced context. The weights are MIT-licensed for commercial use and distillation. Our local installation tutorial covers the practical recipe, including Ollama and vLLM paths.

Does DeepSeek R1 support tool calling and JSON mode?

The original January 2025 R1 did not; the May 2025 R1-0528 update added both. Since `deepseek-reasoner` now routes to V4-Flash, you get full tool calling plus JSON mode today. JSON mode is designed to return valid JSON, not guaranteed — include the word “json” and a small example schema in your prompt, and set max_tokens high enough to avoid truncation. Details on our JSON mode guide.