How to Run DeepSeek on Windows: Four Routes That Actually Work

Run DeepSeek on Windows four ways — web, Microsoft Store app, local Ollama, or V4 API. Compare hardware needs and pick the right one today.

How to Run DeepSeek on Windows: Four Routes That Actually Work

Guides·April 25, 2026·By DS Guide Editorial

If you have searched for “DeepSeek on Windows” expecting a single official installer, the answer is more nuanced — and better. There are four legitimate routes, and which one fits depends on whether you want a polished chat window, an offline model running on your own GPU, or scripted access to the new V4 API from PowerShell. This guide walks through each option with hardware requirements, exact commands, and the trade-offs we have hit running these setups in production. By the end you will know which route matches your machine, your privacy stance, and your budget — and which to avoid.

The short answer: pick one of four routes

DeepSeek does not ship a traditional “DeepSeek.exe” desktop client maintained by the company itself. Instead, Windows users have four practical routes, and they cover almost every use case from casual chat to offline inference on a 128 GB workstation.

Route What you get Best for Cost
Browser at chat.deepseek.com Full V4 web chat with DeepThink toggle Casual users, students, writers Free
Microsoft Store app Official wrapper around the web chat Users who prefer a Start-menu icon Free
Ollama + Open-WebUI Local distilled or quantized weights, offline Privacy-sensitive workflows, no internet Hardware only
V4 API from Python/PowerShell Programmatic access to deepseek-v4-pro and deepseek-v4-flash Developers, automation, agents Pay per token

The current generation behind all four routes is DeepSeek V4, released April 24, 2026. DeepSeek V4 ships as two open-weight Mixture-of-Experts models: deepseek-v4-pro (1.6T total / 49B active) and deepseek-v4-flash (284B total / 13B active), both under the MIT license.

Route 1: Use chat.deepseek.com in any Windows browser

This is the simplest route, and for most readers it is the right one. Open Edge, Chrome, or Firefox on Windows 10 or 11, go to chat.deepseek.com, sign in with an email or Google account, and you are talking to V4. The DeepThink toggle now switches V4 between non-thinking and thinking mode rather than swapping models.

If you want it to feel more like a desktop app without installing anything, Edge offers “Install this site as an app” from the three-dot menu — it strips the browser chrome and pins a launcher to your taskbar. That trick gets you 90% of what a native client would offer.

What the web chat does that the API does not

The web chat keeps your conversation history server-side, so you can return to past chats from any device after signing in. The API, covered in Route 4, is stateless — the client must resend the full message history with every request. Do not assume one behaves like the other; this catches out a lot of first-time integrators. For a deeper walkthrough see our DeepSeek chat guide.

Route 2: Install the official Microsoft Store app

DeepSeek publishes an official app in the Microsoft Store. It is, in practice, a wrapper around the same web chat, but it gives you a Start-menu icon, a separate window from your browser, and easier multi-account separation if you keep work and personal logins apart.

  1. Open the Microsoft Store on Windows 10 or Windows 11.
  2. Search for DeepSeek. The publisher should read as the official DeepSeek entity — verify before installing, since copycat listings appear regularly.
  3. Click Get, then Open.
  4. Sign in with the same credentials you use on the web.

Treat anything outside the Microsoft Store with caution. Search results for “DeepSeek Windows download” surface several third-party sites with names like “DeepSeek Unchained” or “DeepSeek Windows” that are not affiliated with the company — some bundle their own UI around the public API, others bundle questionable installers. If you want to verify what you are installing, our verify official DeepSeek app walkthrough lists the publisher fields and certificates to check.

Route 3: Run DeepSeek locally with Ollama on Windows 11

This is the route to pick if you want offline operation, no token billing, and conversations that never leave your machine. The catch is that you will not be running V4-Pro at home — its weights are 865 GB on Hugging Face, with V4-Flash at 160 GB. Even Simon Willison, on a 128 GB M5 MacBook Pro, was hopeful but uncertain about running a quantized V4-Flash. On most consumer Windows hardware you will instead run a smaller R1 distillation.

Hardware reality check

  • 8 GB VRAM (RTX 3060, 4060): comfortable with the 7B R1 distill.
  • 16–24 GB VRAM (RTX 4080, 4090): 14B and 32B distills run well.
  • 48 GB+ VRAM or unified memory: 70B distill territory.
  • Full 671B or V4-Flash 284B: workstation or server only — hundreds of gigabytes of disk and a high-end multi-GPU rig.

For a deeper hardware breakdown see our DeepSeek system requirements guide.

Step-by-step install on Windows 11

  1. Install Ollama for Windows from ollama.com. The installer registers a background service.
  2. Open PowerShell and verify the install: ollama -v.
  3. Pull a model that fits your hardware. For a typical gaming PC: ollama pull deepseek-r1:7b.
  4. Run it: ollama run deepseek-r1:7b. You now have a working chat in the terminal.
  5. Optional — install Docker Desktop and enable WSL2, then run Open-WebUI in a container. It exposes a browser interface at http://localhost:3000 that looks similar to ChatGPT and lets you switch between any locally pulled models.

If you have an NVIDIA GPU, install the matching CUDA toolkit before pulling models — Ollama will detect it and offload layers to the GPU automatically, which makes a 5–10× speed difference at inference time. Our running DeepSeek on Ollama tutorial covers the WSL2 and Docker pieces in more detail, and the broader install DeepSeek locally tutorial covers non-Ollama options like LM Studio and llama.cpp.

Route 4: Call the DeepSeek V4 API from Windows

If you write code, this is where DeepSeek is most interesting. Chat requests hit POST /chat/completions, the OpenAI-compatible endpoint, and DeepSeek also exposes an Anthropic-compatible surface at the same base URL. That means the official OpenAI Python or Node SDK works against DeepSeek by changing only base_url and api_key.

Get a key, then call the endpoint

  1. Sign up at platform.deepseek.com and create an API key. Our get a DeepSeek API key walkthrough has screenshots.
  2. Install Python 3.10+ for Windows from the Microsoft Store or python.org.
  3. Open PowerShell and install the OpenAI SDK: pip install openai.
  4. Set your key as an environment variable so it is not hard-coded: $env:DEEPSEEK_API_KEY = "sk-...".

A minimal Python call to V4-Flash in non-thinking mode:

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.deepseek.com",
    api_key=os.environ["DEEPSEEK_API_KEY"],
)

resp = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[{"role": "user", "content": "Summarise the V4 release."}],
    temperature=1.3,
    max_tokens=1024,
)
print(resp.choices[0].message.content)

To switch to thinking mode on either V4 tier, add reasoning_effort="high" and extra_body={"thinking": {"type": "enabled"}}. The response then returns reasoning_content alongside the final content. For maximum reasoning effort use reasoning_effort="max" and ensure your context window is set to at least 384K tokens.

Legacy IDs and migration

If you have older code referencing deepseek-chat or deepseek-reasoner, those IDs still work — they currently route to deepseek-v4-flash in non-thinking and thinking mode respectively. They will be retired on 2026-07-24 at 15:59 UTC, after which requests using them will fail. Migration is a one-line model= swap; base_url does not change. See our DeepSeek OpenAI SDK compatibility notes for the full mapping.

What it costs to run on Windows

The current rates, per 1M tokens, taken from DeepSeek’s pricing page as of April 2026:

Model Cache hit input Cache miss input Output
deepseek-v4-flash $0.028 $0.14 $0.28
deepseek-v4-pro $0.145 $1.74 $3.48

Worked example for a Windows-side automation that hits V4-Flash 1,000,000 times with a 2,000-token cached system prompt, a 200-token user message, and a 300-token answer:

  • Cached input: 2,000,000,000 tokens × $0.028/M = $56.00
  • Uncached input: 200,000,000 tokens × $0.14/M = $28.00
  • Output: 300,000,000 tokens × $0.28/M = $84.00
  • Total: $168.00

Do not skip the uncached-input line — the user message on each call is still a miss against the cached prefix. Off-peak discounts ended on 2025-09-05 and have not returned with V4. Compare against current OpenAI and Anthropic pricing pages before assuming “DeepSeek is the cheapest”; it is among the cheapest frontier-tier APIs as of April 2026, but the comparison moves. Our DeepSeek pricing calculator handles the full math for your own workload.

Other Windows-side features worth knowing

  • Context window: 1,000,000 tokens by default on both V4 tiers, with output up to 384,000 tokens.
  • JSON mode: set response_format={"type": "json_object"}. It is designed to return valid JSON, not guaranteed — include the word “json” plus a small example schema in your prompt, and set max_tokens high enough to avoid truncation.
  • Streaming: set stream=true and you will receive server-sent events; when thinking is enabled the reasoning content streams alongside the final content.
  • Tool calling and FIM completion: tool calling works in both thinking and non-thinking modes; FIM (Fill-In-the-Middle) completion is Beta and non-thinking only — useful for IDE integrations on Windows.
  • Temperature presets: DeepSeek recommends 0.0 for code and maths, 1.0 for data analysis, 1.3 for general conversation, 1.5 for creative writing.

Common Windows-specific issues

Symptom Likely cause Fix
Ollama “model not found” Mistyped tag (e.g. deepseek-r1:7 instead of :7b) Run ollama list to confirm exact tags
Ollama runs on CPU only CUDA toolkit missing or wrong version Install matching CUDA, restart Ollama service
Open-WebUI cannot reach Ollama WSL2 networking Bind Ollama to 0.0.0.0 and use host’s IP from WSL
API call returns 401 Env var not set in the current PowerShell session Re-set $env:DEEPSEEK_API_KEY or use setx for persistence
JSON mode returns empty content Documented behaviour Retry; ensure prompt contains “json” and a schema

If your problems are broader than this, our DeepSeek troubleshooting guide covers login, app, and API errors across platforms, and the DeepSeek beginner guides hub indexes every platform-specific guide we maintain.

Which route should you actually pick?

  • Pick the browser or Microsoft Store app if you want to chat, write, or research without managing infrastructure. This is the right answer for 80% of readers.
  • Pick Ollama if you need offline operation, your data cannot leave the machine, or you want to experiment with the open weights at a smaller scale.
  • Pick the API if you are automating, building an internal tool, or running an agent. Start with deepseek-v4-flash for most workloads; move to deepseek-v4-pro only when your own evaluation shows the lift is worth roughly 7× the cost.

Cross-platform users should also see DeepSeek on Mac for the equivalent setup on Apple Silicon, and Reuters’ V4 launch coverage for context on what changed in this generation.

Last verified: 2026-04-25. DeepSeek AI Guide is an independent resource and is not affiliated with DeepSeek or its parent company. Model IDs, pricing and API behaviour change; check the official DeepSeek documentation and pricing page before committing to a production decision.

Is there an official DeepSeek app for Windows?

Yes — DeepSeek publishes an official app in the Microsoft Store for Windows 10 and 11. It is essentially a wrapper around the web chat at chat.deepseek.com, but it adds a Start-menu launcher and a dedicated window. Avoid third-party “DeepSeek for Windows” installers found on download portals; they are not maintained by DeepSeek. See our DeepSeek download guide for verified sources.

How do I run DeepSeek offline on Windows 11?

Install Ollama from ollama.com, then run ollama pull deepseek-r1:7b in PowerShell to download a 7B distill that fits on an 8 GB GPU. For a graphical interface, layer Open-WebUI on top via Docker Desktop with WSL2. The full 671B or V4-Flash 284B weights need a workstation. Our DeepSeek offline setup tutorial has the complete walkthrough.

Can my PC run DeepSeek V4-Flash locally?

Probably not without serious hardware. V4-Flash is 284B parameters and around 160 GB on Hugging Face; even quantized, it pushes well past consumer GPU memory. Most Windows users with an RTX 4080 or 4090 should run a 14B–32B R1 distill instead, or use the API for V4 access. Check our DeepSeek hardware calculator to size your machine.

What does the DeepSeek API cost from Windows?

Pricing is per token, not per platform. As of April 2026, V4-Flash lists $0.14 input cache-miss and $0.28 output per 1M tokens; V4-Pro lists $1.74 and $3.48. A million calls with a 2,000-token cached prompt and short replies on V4-Flash works out to about $168 total. See our DeepSeek API pricing reference for current rates.

Does DeepSeek on Windows still support the old deepseek-chat model ID?

Yes, until 2026-07-24 at 15:59 UTC. The legacy IDs deepseek-chat and deepseek-reasoner currently route to deepseek-v4-flash in non-thinking and thinking mode. After that date, requests using those IDs will fail. Migrating is a one-line model= swap; base_url does not change. Our DeepSeek API documentation hub tracks the deprecation timeline.

Leave a Reply

Your email address will not be published. Required fields are marked *