The Full DeepSeek History: Every Model and Milestone, 2023-2026
If you have only met DeepSeek through the January 2025 R1 headlines, the rest of the story is missing. The full DeepSeek history starts not with a chatbot but with a hedge fund stockpiling Nvidia GPUs, runs through a string of open-weight model releases that quietly reset Chinese AI pricing, and lands — as of April 24, 2026 — on the V4 Preview, a two-tier Mixture-of-Experts family with a 1-million-token context window. This guide walks the timeline release by release, names the people and decisions behind each turn, and flags the moments that mattered for anyone using DeepSeek in production today. Expect dates, parameter counts, prices, and the receipts to back them up.
Before DeepSeek: a hedge fund that bought GPUs early
DeepSeek did not start as an AI company. It started as a side bet inside a quantitative hedge fund called High-Flyer. High-Flyer was co-founded in February 2016 by Liang Wenfeng, who had been trading since the 2008 financial crisis while attending Zhejiang University. The fund used machine learning for trading, and by the early 2020s it had built a war chest of GPUs that most Chinese tech companies still envied.
In May 2023, Liang announced High-Flyer would pursue the development of artificial general intelligence and launched DeepSeek. During that month in an interview with 36Kr, Liang stated that High-Flyer had acquired 10,000 Nvidia A100 GPUs before the US government imposed AI chip restrictions on China. That stockpile mattered. It meant DeepSeek could train frontier-scale models after the export controls hit, when most domestic peers could not.
On 14 April 2023, High-Flyer announced the launch of an artificial general intelligence research lab, stating that the new lab would focus on developing AI tools unrelated to the firm’s financial business. Two months later, on 17 July 2023, that lab was spun off into an independent company, DeepSeek, with High-Flyer as its principal investor and backer. Venture capital firms were reluctant to provide funding as it was unlikely that it would be able to generate an exit in a short period of time, so DeepSeek was funded almost entirely by the hedge fund — a structural fact that still shapes how the company behaves today.
2023: the first models — Coder and LLM
The first releases were modest by 2026 standards but established a pattern: ship open weights, publish a technical report, undercut on price.
- DeepSeek Coder (November 2, 2023) — the lab’s first public model, focused on code completion.
- DeepSeek LLM (November 29, 2023) — a general-purpose dense language model in 7B and 67B sizes.
DeepSeek released its first model, DeepSeek Coder, on 2 November 2023, followed by the DeepSeek-LLM series on 29 November 2023. Neither made global news at the time. They did, however, signal that the company was serious about publishing weights rather than gating them behind an API.
2024: MoE, Math, V2, and the first price war
2024 was the year DeepSeek’s research bets started compounding. In January 2024, it released two DeepSeek-MoE models (Base and Chat), and in April 3 DeepSeek-Math models (Base, Instruct, and RL). DeepSeek-V2 was released in May 2024, followed a month later by the DeepSeek-Coder V2 series.
DeepSeek-V2 was the model that forced the rest of the Chinese market to pay attention. The team priced it aggressively, and the response was immediate. When DeepSeek-V2 was released and triggered an AI price war in China, it came as a huge surprise as the team did not expect pricing to be so sensitive. Liang’s aggressive pricing of the language model forced domestic tech giants including Alibaba and Baidu to cut their own rates by over 95%.
Internally, V2 also introduced the architectural idea that would carry through to every later release: Multi-head Latent Attention (MLA). A significant outcome was the multi-head latent attention training architecture, which was attributed directly to a young DeepSeek researcher’s personal interest. This advancement played a core role in reducing the cost of training the DeepSeek-V3 model, released in December 2024.
DeepSeek-V3 (December 2024)
V3 was a 671-billion-parameter MoE model with roughly 37B active per token — large by any standard, and trained for a fraction of what frontier US labs were spending. The widely cited training compute figure of around $5.6 million traces to the V3 technical report; this is the number journalists later confused with R1’s training cost.
January 2025: R1 and the “Sputnik moment”
On 20 January 2025, DeepSeek released DeepSeek-R1, a 671-billion-parameter open-source reasoning AI model, alongside the publication of a detailed technical paper explaining its architecture and training. R1 matched OpenAI’s o1 on several reasoning benchmarks while shipping open weights and an API priced at a small fraction of o1’s.
The market reaction was extraordinary. Observers say this breakthrough sent “shock waves” through the industry which were described as triggering a “Sputnik moment” for the US in the field of artificial intelligence, particularly due to its open-source, cost-effective, and high-performing AI models. This threatened established AI hardware leaders such as Nvidia; Nvidia’s share price dropped sharply, losing US$600 billion in market value, the largest single-company decline in U.S. stock market history.
One detail to keep straight: the famous “under $6 million” figure belongs to V3, not R1. The R1 model had alarmed investors when DeepSeek revealed that it had only taken two months, and not even $6 million, to build the model using lower-capacity Nvidia chips. DeepSeek later disclosed a separate R1-specific training cost of roughly $294,000 to Reuters in September 2025. Both numbers are real; they belong to two different models.
For a deeper review of how R1 actually behaved in production, see our DeepSeek R1 review and the head-to-head DeepSeek R1 vs OpenAI o1 comparison.
Mid-to-late 2025: regulatory blowback and a quieter year of iteration
R1’s success did not come without friction. Multiple US states, Australia, Taiwan, South Korea, Denmark and Italy introduced bans or other restrictions on DeepSeek-R1 shortly after its release, citing privacy and national security concerns. Italy’s Garante moved against the consumer app early in 2025; several US states blocked DeepSeek on government devices. The federal picture in the US remains state-by-state rather than a single national ban — check our tracker on DeepSeek US restrictions for current status.
The company’s planned successor to R1 had a bumpy year. There were reports that R2, the intended successor to R1, was originally planned for release in early May 2025. However, on 28 May 2025, R1 was instead updated to version R1-0528. As of early July, R2 was not yet released, as Liang Wenfeng was not yet satisfied with its performance.
Two other shifts mattered for developers in this window:
- September 5, 2025 — DeepSeek discontinued the off-peak / night-time API discount that V3-era users had relied on. It has not been reintroduced.
- August 2025 — V3.1 shipped with a “DeepThink” toggle that switched a single model between thinking and non-thinking modes, foreshadowing the V4 design.
- December 2025 — V3.2 became the previous-generation default and stayed there until V4.
April 24, 2026: DeepSeek V4 Preview
DeepSeek published the DeepSeek-V4 Preview on April 24, 2026 — a new open-weight Mixture-of-Experts series that stakes the lab’s thesis on a single claim: million-token context processing is not a capability problem anymore, it’s an efficiency problem. Two models ship today. V4-Pro packs 1.6 trillion total parameters with 49 billion activated per token. V4-Flash is the efficient sibling at 284 billion total / 13 billion active. Both support native 1M context, and both are open weights.
Practical points every developer touching the API should know:
- The two model IDs are
deepseek-v4-proanddeepseek-v4-flash. Both are MIT-licensed. - Thinking mode is a request parameter, not a separate model ID. Send
reasoning_effort="high"withextra_body={"thinking": {"type": "enabled"}}to enable it, orreasoning_effort="max"for the heaviest setting. Omit both to stay in non-thinking mode. - The API is stateless. Chat requests hit
POST /chat/completions, the OpenAI-compatible endpoint athttps://api.deepseek.com; clients must resend the full conversation history every call. Supports OpenAI ChatCompletions & Anthropic APIs. - Context window is 1,000,000 tokens by default; output up to 384,000 tokens.
- Legacy IDs
deepseek-chatanddeepseek-reasonerstill work, but deepseek-chat & deepseek-reasoner will be fully retired and inaccessible after Jul 24th, 2026, 15:59 (UTC Time). They route todeepseek-v4-flashuntil then.
A minimal Python call against V4-Pro using the OpenAI SDK looks like this:
from openai import OpenAI
client = OpenAI(base_url="https://api.deepseek.com", api_key="...")
resp = client.chat.completions.create(
model="deepseek-v4-pro",
messages=[{"role": "user", "content": "Summarise this codebase."}],
reasoning_effort="high",
extra_body={"thinking": {"type": "enabled"}},
)
When thinking is enabled the response returns reasoning_content alongside the final content. For more on the request shape see the DeepSeek API documentation.
The complete release timeline at a glance
| Date | Release | What changed |
|---|---|---|
| 2023-04-14 | High-Flyer AGI lab announced | Internal research lab |
| 2023-07-17 | DeepSeek incorporated | Spun out as independent company |
| 2023-11-02 | DeepSeek Coder | First public model |
| 2023-11-29 | DeepSeek LLM (7B / 67B) | General-purpose dense models |
| 2024-01 | DeepSeek MoE | First MoE release |
| 2024-04 | DeepSeek Math | Math-specialised models |
| 2024-05 | DeepSeek V2 | MLA architecture, triggers Chinese price war |
| 2024-06 | DeepSeek Coder V2 | Coder line revived on V2 backbone |
| 2024-12 | DeepSeek V3 | 671B MoE, ~$5.6M training compute |
| 2025-01-20 | DeepSeek R1 | Reasoning model; “Sputnik moment” |
| 2025-05-28 | R1-0528 update | R2 deferred |
| 2025-08 | V3.1 | Single model with DeepThink toggle |
| 2025-09-05 | Off-peak API discount discontinued | Pricing simplified |
| 2025-12 | V3.2 | Previous-generation default |
| 2026-04-24 | V4 Preview (Pro + Flash) | 1M context default, two-tier MoE |
| 2026-07-24 | Legacy ID retirement | deepseek-chat/deepseek-reasoner stop working at 15:59 UTC |
Pricing history — what each generation actually charged
Pricing is the cleanest way to see DeepSeek’s strategy. Each generation has come in at or below the last.
| Model (era) | Input miss $/1M | Output $/1M | Notes |
|---|---|---|---|
| V3 (Dec 2024) | $0.27 | $1.10 | Off-peak discount era |
| R1 (Jan 2025) | $0.55 | $2.19 | Reasoning premium |
| V3.2 (Dec 2025) | $0.28 | $0.42 | Retired April 2026 |
| V4-Flash (Apr 2026) | $0.14 | $0.28 | Default for chat workloads |
| V4-Pro (Apr 2026) | $1.74 | $3.48 | Frontier-tier |
Rates as of April 2026; verify on the DeepSeek API pricing page before you commit a workload. DeepSeek’s pricing page charges $0.14/million tokens input and $0.28/million tokens output for Flash, and $1.74/million input and $3.48/million output for Pro.
Worked example — V4-Flash at scale
One million chat-style API calls with a 2,000-token cached system prompt, a 200-token user message (uncached), and a 300-token response on deepseek-v4-flash:
Cached input : 2,000,000,000 tokens × $0.028/M = $56.00
Uncached input : 200,000,000 tokens × $0.14/M = $28.00
Output : 300,000,000 tokens × $0.28/M = $84.00
-------
Total $168.00
The same workload on deepseek-v4-pro would cost $1,682.00 — roughly 10× the Flash bill. Pick the tier that matches the task, not the marketing.
The architecture story underneath
One reason the DeepSeek timeline reads as a series of price cuts is that the lab’s research roadmap has been compounding. MLA arrived in V2, the V3 report introduced auxiliary-loss-free load balancing for MoE, R1 demonstrated pure reinforcement-learning-driven reasoning, and V4 ships what DeepSeek calls Hybrid Attention Architecture. DeepSeek singled out a technique it dubbed Hybrid Attention Architecture, which it said improves the ability of an AI platform to remember queries across long conversations. It also pushed the 1 million-token context window — a leap that allows entire codebases or long documents to be sent as a single prompt.
The bottom-up culture matters here too. DeepSeek aimed to maintain a low-hierarchy corporate culture, with members working in project-based groups, as well as competitive compensation. Liang emphasized his vision for DeepSeek employees to bring their “unique experience and ideas” instead of needing to be explicitly directed, with an overall bottom-up approach to division of labor. Liang noted that a significant outcome of this approach was the multi-head latent attention training architecture, which was attributed directly to a young DeepSeek researcher’s personal interest. The single biggest architectural win in DeepSeek’s history came from a junior researcher’s curiosity — that is the kind of fact worth knowing about an AI lab.
Where to go next
If this timeline is the first DeepSeek piece you have read, three follow-ups are worth your time depending on your goal:
- To use the current generation — see our overview of DeepSeek V4 and the dedicated pages for DeepSeek V4-Pro and DeepSeek V4-Flash.
- To compare against Western incumbents — read DeepSeek vs ChatGPT and DeepSeek vs Claude.
- For the broader catalogue of model write-ups, the DeepSeek models hub lists every release individually.
This article will be updated as V4 leaves preview and as the legacy model IDs retire on July 24, 2026.
Last verified: 2026-04-25. DeepSeek AI Guide is an independent resource and is not affiliated with DeepSeek or its parent company. Model IDs, pricing and API behaviour change; check the official DeepSeek documentation and pricing page before committing to a production decision.
When was DeepSeek founded and by whom?
DeepSeek was incorporated on July 17, 2023, in Hangzhou, China, after being spun out of the AGI research lab that hedge fund High-Flyer had announced in April 2023. The founder and CEO is Liang Wenfeng, who also co-founded High-Flyer in 2016. The hedge fund remains DeepSeek’s principal financial backer. For the broader corporate picture see our what is DeepSeek primer.
What was DeepSeek’s first model?
DeepSeek’s first public model was DeepSeek Coder, released on November 2, 2023, followed by the general-purpose DeepSeek LLM series on November 29, 2023. Both shipped with open weights and a published technical report — a pattern the company has kept ever since. The Coder line later evolved into Coder V2; you can read the dedicated write-up on DeepSeek Coder.
Why did DeepSeek R1 cause such a market reaction in January 2025?
R1 matched OpenAI’s o1 on several reasoning benchmarks while shipping under MIT licensed weights and pricing the API at a small fraction of o1’s. The combination — open weights, frontier reasoning, low cost, and a credible Chinese training-cost claim — sent Nvidia’s share price down sharply and was widely described as a “Sputnik moment”. Our DeepSeek R1 review walks through what held up in production.
What is the latest DeepSeek model in 2026?
As of April 25, 2026, the current generation is the DeepSeek V4 Preview, released April 24, 2026. It ships as two open-weight MoE models: deepseek-v4-pro (1.6T total / 49B active) and deepseek-v4-flash (284B / 13B active), both with a 1-million-token context window and MIT licensed weights. Track ongoing changes on our DeepSeek latest updates feed.
How much did it cost to train DeepSeek’s models?
Two figures get conflated. DeepSeek-V3, released December 2024, reported around $5.6 million in compute cost in its technical report. DeepSeek-R1, the January 2025 reasoning model, was later disclosed as a $294,000 training run in a September 2025 Reuters report. The “$6 million” figure widely quoted in 2025 belongs to V3. For pricing-side context see DeepSeek API pricing.
