DeepSeek Free vs Paid in 2026: Web Chat, API Costs and the Honest Trade-offs
If you are weighing DeepSeek free vs paid access in 2026, the answer depends on which surface you mean. The web chat at chat.deepseek.com and the mobile app are free for individuals, running DeepSeek V4 in Expert or Instant mode. The API is a separate, pay-per-token product — there is no consumer subscription to “unlock” more in the chat. I run V4-Pro and V4-Flash in production, ran V3.2 and R1 before that, and have tested the app on three continents. This guide lays out exactly what each side includes, what the V4 API actually costs per million tokens, where the “free” story gets asterisks, and how to decide which path fits your workload.
The short answer: free chat, paid API, no consumer subscription
DeepSeek splits into three surfaces — a free web chat, a free mobile app, and a paid API. None of them is a “freemium” gateway to the others. The chat is free with no published paywall; the API bills per token with no monthly fee. That’s it. There is no “DeepSeek Plus” or “Pro subscription” on the consumer side as of April 24, 2026.
DeepSeek’s official site positions the consumer product as “Free access to DeepSeek. Experience the intelligent model.” The same homepage confirms the V4 Preview is now available on web, app, and API. On the developer side, the API is usage-based and invoiced per million tokens against a prepaid balance.
What the free tier actually includes
The free tier is the web chat (chat.deepseek.com) and the mobile apps for iOS and Android. Both run DeepSeek V4 by default. V4-Pro is accessible as “Expert Mode” on chat.deepseek.com, while V4-Flash is exposed as “Instant Mode”. The DeepThink toggle — familiar from the V3.x era — now switches V4 between non-thinking and thinking mode rather than picking a different model.
Features you get without paying:
- Access to both V4 tiers through Expert Mode (Pro) and Instant Mode (Flash)
- A DeepThink toggle that enables thinking mode on the selected tier
- Web search grounding from the chat UI
- File upload with text extraction (PDFs, docs, spreadsheets)
- Cross-device chat history sync once you sign in
- Session state preserved server-side, so multi-turn conversations “remember” prior turns in the UI
That last point matters when you later move to the API: the web chat maintains session history for you, but the API does not.
What DeepSeek does not publicly document on the free tier
DeepSeek has not published a fixed daily message cap for the free web chat. Articles claiming “unlimited messages with no cap” are overclaiming — capacity in practice depends on current traffic. DeepSeek occasionally imposes temporary freezes on new registrations and API top-ups during viral traffic surges, lasting hours to a day; these do not affect existing balances or chat access, though status pages are worth monitoring during critical windows. Treat the free tier as generous but best-effort, not as an SLA.
What the paid side looks like: the V4 API
The paid product is the DeepSeek API. Chat requests hit POST /chat/completions, the OpenAI-compatible endpoint at https://api.deepseek.com. DeepSeek also ships an Anthropic-compatible surface against the same base URL, so you can bring either SDK.
V4 ships as two model IDs, both open-weight MoE models under the MIT license:
deepseek-v4-pro— 1.6 trillion total parameters, 49 billion active per token, the flagship frontier tierdeepseek-v4-flash— 284 billion total parameters, 13 billion active, the efficiency tier
Both models carry a 1,000,000-token context window by default and can produce up to 384,000 output tokens. Thinking mode is a request parameter, not a separate model ID — you pick the tier in the model field and add reasoning_effort="high" with extra_body={"thinking": {"type": "enabled"}} to turn thinking on.
A minimal Python call using the OpenAI SDK:
from openai import OpenAI
client = OpenAI(
base_url="https://api.deepseek.com",
api_key="sk-...",
)
resp = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[{"role": "user", "content": "Summarise this report."}],
max_tokens=1024,
temperature=1.3,
)
print(resp.choices[0].message.content)
The API is stateless. You must resend the full conversation history on every request to sustain a multi-turn exchange — the server does not remember prior turns for you.
Legacy IDs and the July retirement
If you already have code against deepseek-chat or deepseek-reasoner, it still works for now. DeepSeek’s release notes state that deepseek-chat and deepseek-reasoner will be fully retired and inaccessible after July 24, 2026 at 15:59 UTC, currently routing to deepseek-v4-flash in non-thinking and thinking mode respectively. Migration is a one-line change: swap model=, keep base_url.
V4 API pricing — the numbers that matter
Pricing, per 1 million tokens, as of April 24, 2026. Verify on the official DeepSeek pricing page before committing to a budget.
| Model | Input, cache hit | Input, cache miss | Output |
|---|---|---|---|
deepseek-v4-flash |
$0.028 | $0.14 | $0.28 |
deepseek-v4-pro |
$0.145 | $1.74 | $3.48 |
Simon Willison’s coverage on the release day confirms the same figures: “They’re charging $0.14/million tokens input and $0.28/million tokens output for Flash, and $1.74/million input and $3.48/million output for Pro.”
Three things to notice. First, V4-Pro is roughly 12× the output rate of V4-Flash — the gap is real, and for chat or extraction workloads Flash is the sensible default. Second, prices are the same whether you are in thinking mode or non-thinking mode — the model ID sets the rate; reasoning mode just changes how many tokens you burn. Third, off-peak discounts ended on September 5, 2025 and have not returned with V4.
How cache-hit pricing works
Cache-hit pricing applies automatically when DeepSeek detects a repeated prefix in your messages array. Every request with a repeated prefix against the same account benefits; you do not need to opt in. Prefixes must be at least 1,024 tokens long and match byte-for-byte. You still pay the cache-miss rate on the new user message each call — caching does not cover the whole request.
Cost worked example: one million support-chat calls
A realistic workload: a customer-support bot with a 2,000-token system prompt (cached across calls), a 200-token user message each call, and a 300-token reply. One million calls per month. Tier: deepseek-v4-flash.
| Token bucket | Tokens | Rate ($/M) | Cost |
|---|---|---|---|
| Input, cache hit (system prompt) | 2,000,000,000 | $0.028 | $56.00 |
| Input, cache miss (user messages) | 200,000,000 | $0.14 | $28.00 |
| Output | 300,000,000 | $0.28 | $84.00 |
| Total | $168.00 |
The same shape on deepseek-v4-pro costs $290 + $348 + $1,044 = $1,682.00 — close to 10× more. Unless you need frontier coding or agent performance, Flash wins on price-per-useful-answer for this kind of work. For a more detailed breakdown, try the DeepSeek pricing calculator.
Free web chat vs paid API: head-to-head
| Feature | Free web chat / app | Paid API |
|---|---|---|
| Cost | $0 | Per-token, prepaid balance |
| Models | V4-Pro (Expert) and V4-Flash (Instant) | deepseek-v4-pro, deepseek-v4-flash (+ legacy IDs until July 24, 2026) |
| Thinking mode | DeepThink toggle in the UI | reasoning_effort + thinking flag |
| Conversation state | Stored server-side per user | Stateless — client resends history |
| Web search | Built into the chat UI | Not provided — implement via tool calling |
| File upload | Yes (PDF, docs, images) | Text-only messages; manage attachments yourself |
| Rate limits | Best-effort, undocumented | Per-key rate limits — see DeepSeek API rate limits |
| SLA | None | None (commercial but no uptime contract) |
| Data handling | Processed on DeepSeek servers under Chinese jurisdiction | Same; self-host weights if this is a blocker |
The free chat also maintains history; the API does not. If you build a chatbot on the API, that state management is yours to own.
When to stay on free — and when the API pays off
I’ve moved clients both directions over the last year. The honest decision tree:
Stay free if:
- You’re a student, writer or researcher using DeepSeek a few times a day for drafts, explanations or translations
- You need file upload and web search out of the box, with no code
- Your workload is interactive, not programmatic, and you don’t need reproducible outputs
- You’re evaluating the model before committing — the free chat runs the same V4 weights the API serves
Pay for API access if:
- You’re building any product that calls the model from server code
- You need structured outputs (JSON mode), tool calling, streaming, or FIM completion — see DeepSeek API JSON mode
- You need deterministic temperature, reproducibility, or rate limits you can reason about
- Your cached-prefix pattern would compound savings via DeepSeek context caching
- You want to compare DeepSeek programmatically against alternatives (OpenAI, Anthropic, Google) — the OpenAI SDK compatibility means your existing code mostly works
Two traps worth calling out
First, “granted balance” is not the same as free credits. DeepSeek’s API billing docs describe a system where fees are deducted from your topped-up balance or granted balance, with a preference for using the granted balance first when both are available. Treat any granted balance as a promotional credit that may or may not be available to you, and check the billing console — do not rely on blog posts claiming “new accounts get $10 free.”
Second, the app ecosystem is crowded with lookalikes. DeepSeek-V4 Preview is available on the official web, app, and API. Third-party sites such as deep-seek.ai proxy the model and offer their own no-login experiences, but they are not DeepSeek. If privacy and provenance matter to you, use the verified official channels only — see our guide to verifying the official DeepSeek app.
Privacy and jurisdiction — the same for both tiers
Whether you use the free chat or the paid API, your prompts are processed on DeepSeek’s infrastructure, which is subject to Chinese law. That applies to both surfaces equally. If that is a deal-breaker — for regulated work, personal data, or source code you cannot ship — the alternative is to download the MIT-licensed V4 weights and self-host. V4-Pro is a Mixture of Experts model with 1.6 trillion total parameters but only 49 billion active during inference, released under the MIT license which allows commercial use. You’ll need serious hardware, but the licensing is clean. Our DeepSeek privacy guide covers the nuances in depth.
Verdict
For individual exploration, the free tier is excellent value — you get both V4 tiers, thinking mode, web search and file upload at no cost. For anything programmatic, the paid API is among the cheapest frontier-tier offerings in April 2026, with V4-Flash at $0.14 input miss / $0.28 output per million tokens, and V4-Pro at $1.74 / $3.48. Compare those against OpenAI’s and Anthropic’s current pricing pages before committing, but for most teams the “DeepSeek free vs paid” decision is not either/or: prototype on the free chat, move to the API when you ship.
Last verified: 2026-04-24. DeepSeek AI Guide is an independent resource and is not affiliated with DeepSeek or its parent company. Model IDs, pricing and API behaviour change; check the official DeepSeek documentation and pricing page before committing to a production decision.
Frequently asked questions
Is DeepSeek free to use?
Yes, for individuals. The web chat at chat.deepseek.com and the DeepSeek mobile apps are free and run V4-Pro (Expert Mode) and V4-Flash (Instant Mode) with DeepThink, web search and file upload. The API is the paid product, billed per million tokens with no monthly subscription. There is no consumer “Plus” plan. For a deeper breakdown see our guide on whether DeepSeek is free.
How much does the DeepSeek API cost in 2026?
V4-Flash is $0.028 cache-hit / $0.14 cache-miss / $0.28 output per 1M tokens. V4-Pro is $0.145 / $1.74 / $3.48. Prices are the same in thinking and non-thinking mode — the model ID sets the rate. Off-peak discounts ended September 5, 2025 and were not reintroduced with V4. See DeepSeek API pricing for the full current table.
Does DeepSeek have a daily message cap on the free chat?
DeepSeek does not publicly document a fixed daily message cap as of April 2026. In practice the free chat is generous but best-effort, and traffic surges can cause temporary capacity pressure. Treat the free tier as reliable for everyday use, not as an SLA for production workloads. See our DeepSeek limitations guide for current gotchas.
What’s the difference between the web chat and the API?
The web chat keeps conversation history for you and bundles web search and file upload into the UI. The API is stateless — you resend the full messages array every call — and does not provide search or uploads; you wire those up via tool calling. The API also gives you rate-limit controls, streaming, JSON mode and context caching. The DeepSeek API getting started tutorial walks through the first call.
Can I switch from deepseek-chat to V4 without rewriting my code?
Yes. The base URL does not change and the wire format is OpenAI-compatible. Update the model field to deepseek-v4-flash (or deepseek-v4-pro) and you’re on V4. Legacy IDs deepseek-chat and deepseek-reasoner still work and currently route to V4-Flash, but will be retired on July 24, 2026 at 15:59 UTC. Our DeepSeek V4 model page has migration detail.
