DeepSeek System Requirements for Web, App and Local Use

DeepSeek system requirements for the web chat, mobile app, API and local V4 or R1 deployments — exact VRAM, RAM and OS specs. Check your hardware now.

DeepSeek System Requirements for Web, App and Local Use

Guides·April 25, 2026·By DS Guide Editorial

“Will my laptop run DeepSeek?” is the wrong question, because there are five different ways to use it and each has its own answer. The DeepSeek system requirements you need depend entirely on whether you are opening chat.deepseek.com in a browser, installing the mobile app, calling the API from a server, running an R1 distill in Ollama, or self-hosting the full DeepSeek-V4 weights on a GPU cluster. The web and mobile clients ask very little of your hardware. Self-hosting V4-Pro asks for hundreds of gigabytes of GPU memory. This guide gives you the specs for each path, based on the official V4 release on April 24, 2026 and current community deployments.

The five ways to use DeepSeek — and what each needs

Before listing specs, decide which surface you actually need. The hardware bill goes up by roughly four orders of magnitude from the lightest to the heaviest option.

Access method Where compute happens Your hardware burden Best for
Web chat (chat.deepseek.com) DeepSeek servers Any modern browser Casual use, trying V4
Mobile app (iOS / Android) DeepSeek servers Recent phone, ~150 MB free On-the-go chat
API (POST /chat/completions) DeepSeek servers Any machine that can make HTTPS requests Apps, automation
Local distilled model (Ollama) Your GPU/CPU 8–48 GB VRAM Privacy, offline coding
Self-hosted V4 weights (vLLM) Your GPU cluster 141 GB+ VRAM (Flash) to 1 TB+ (Pro) Enterprise deployments

The first three paths offload compute to DeepSeek’s infrastructure. Only the last two are real “system requirements” questions in the hardware sense. We cover all five below.

Web chat system requirements

The web client at chat.deepseek.com is a thin browser app. The model — DeepSeek-V4 since April 24, 2026 — runs on DeepSeek’s servers, not yours.

Browser

  • Chrome, Edge, Firefox, Safari or Brave — current major version
  • JavaScript enabled, cookies enabled (login uses session cookies)
  • Stable internet (a few hundred kbps is enough for text)

Operating system

  • Windows 10/11, macOS 12 or later, any modern Linux distribution, ChromeOS
  • No GPU, no minimum RAM beyond what your browser already consumes

The DeepThink toggle on the web interface now switches V4 between non-thinking and thinking mode rather than swapping models, so you do not need different “model” settings to access reasoning. For a deeper walkthrough of the interface, see our DeepSeek chat guide.

Mobile app system requirements

The official iOS and Android apps are also thin clients. Anything from the last four or five years of phones will run them.

Platform Minimum OS Approx. install size Notes
iPhone / iPad iOS 14 or later ~120 MB App Store listing — see DeepSeek on iPhone
Android Android 8.0 (Oreo) or later ~150 MB Google Play — see DeepSeek on Android

If you are checking that you have the genuine app rather than a clone, our guide to verifying the official DeepSeek app walks through the publisher fields.

API system requirements

The DeepSeek API is OpenAI-compatible (and now Anthropic-compatible at the same base URL). Chat requests hit POST /chat/completions, the OpenAI-compatible endpoint. The API is stateless — your client must resend the conversation history on every request. This is the opposite of the web app, which maintains session history server-side.

System requirements for calling the API are negligible: any machine that can run Python 3.8+ or Node.js 18+ and reach https://api.deepseek.com over HTTPS. A minimal Python example using the OpenAI SDK:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.deepseek.com",
    api_key="YOUR_KEY",
)

resp = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[{"role": "user", "content": "Hello"}],
    temperature=1.3,
    max_tokens=512,
)
print(resp.choices[0].message.content)

Current model IDs are deepseek-v4-pro (1.6T total / 49B active parameters, frontier tier) and deepseek-v4-flash (284B / 13B active, cost-efficient tier). Both are open-weight Mixture-of-Experts models under the MIT license. Both support a 1,000,000-token context window with output up to 384,000 tokens, and both expose thinking mode as a request parameter (reasoning_effort="high" with extra_body={"thinking": {"type": "enabled"}}, or reasoning_effort="max") rather than as a separate model ID. The response then returns reasoning_content alongside the final content.

Useful API parameters to know: temperature (DeepSeek recommends 0.0 for code/math, 1.0 for data analysis, 1.3 for general chat, 1.5 for creative writing), top_p, max_tokens, reasoning_effort, plus JSON mode, tool calling, streaming, context caching, FIM completion (Beta, non-thinking only) and Chat Prefix Completion (Beta).

If you maintain an older integration using legacy IDs deepseek-chat or deepseek-reasoner, both currently route to deepseek-v4-flash and will be retired on 2026-07-24 at 15:59 UTC. Migrating is a one-line change to the model= field; base_url stays the same. For detailed migration steps, see our DeepSeek API documentation.

Local deployment: distilled R1 models on Ollama

If you want a model on your own hardware, the realistic starting point is one of the DeepSeek-R1 distilled checkpoints — smaller dense models trained to mimic R1’s reasoning. The distilled variants are available at 7B, 14B, 32B and 70B sizes, all under the MIT license, all installable in one command via Ollama.

VRAM requirements by distill size (Q4 quantization)

Model Approx. VRAM (Q4) Recommended GPU Typical speed
R1-Distill-Qwen-1.5B < 1 GB Almost any GPU, or CPU only 5–30 tok/s
R1-Distill-Qwen-7B ~5 GB RTX 3060 12 GB or better 40–80 tok/s
R1-Distill-Qwen-14B ~8 GB RTX 4060 Ti 16 GB / RTX 3080 30–60 tok/s
R1-Distill-Qwen-32B ~18–20 GB RTX 3090 / 4090 (24 GB) 28–45 tok/s
R1-Distill-Llama-70B ~40 GB Dual 3090/4090 or Mac Studio M4 Max 128 GB 8–15 tok/s

Numbers above are consolidated from independent community tests. The 32B distill at Q4_K_M uses around 20 GB VRAM — a fit for an RTX 3090 or RTX 4090 with about 4 GB of headroom for the KV cache, with roughly 28–35 tokens per second on a 3090 and 38–45 on a 4090. The 70B distill at Q4 needs about 40 GB, beyond any single consumer GPU, so options include dual RTX 3090s for around $1,700 used at roughly 12–15 tokens per second.

For Apple Silicon, unified memory is the trick: the GPU shares the system RAM pool, so an M3 Max or M4 Max with 64 GB or 128 GB can load models that would crash a typical gaming PC, at lower throughput. Inference is meaningfully slower than on a 4090 but the capacity advantage is real. See our DeepSeek on Mac guide for the specifics.

Minimum CPU and RAM

  • Modern x86-64 CPU (Intel 10th gen / AMD Ryzen 3000 or newer) or Apple Silicon
  • 16 GB system RAM is the practical floor; 32 GB recommended once you go past 7B
  • NVMe SSD with 50–200 GB free for model weights and KV cache scratch
  • Linux, Windows 10/11 with WSL2, or macOS 13+

The fastest path is Ollama. Once installed, a single line pulls and runs the model:

ollama run deepseek-r1:32b

Step-by-step setup, including GPU offloading and Open-WebUI, is in our running DeepSeek on Ollama tutorial. For a fuller installation walkthrough across operating systems, see install DeepSeek locally.

Local deployment: full DeepSeek-V4 weights

This is the heavy end. V4 ships in two open-weight tiers, and the size difference matters more than any other spec.

DeepSeek-V4-Flash

V4-Flash in FP4+FP8 mixed precision is approximately 158 GB. It fits on a single H200 (141 GB HBM3e) or 2× A100 80 GB. With INT4 quantization it can potentially fit on 4× RTX 4090, with quality trade-offs. Lushbinary’s self-hosting guide reports the 158 GB checkpoint fits on a single H200 or 2× A100 80 GB, and benchmarks for 1M-token tasks often need at least four A100 80 GB GPUs or multi-node setups so the KV cache can fit.

Realistic minimums for V4-Flash production serving:

  • 1× NVIDIA H200 141 GB or 2× A100/H100 80 GB
  • 500 GB+ NVMe for the checkpoint and runtime files
  • vLLM 0.7+ or SGLang for inference; vLLM exposes an OpenAI-compatible API by default
  • CUDA 12.4+, NCCL configured for multi-GPU, recent NVIDIA driver
  • Linux (Ubuntu 22.04 LTS is the most common target)

DeepSeek-V4-Pro

Hardware requirements for V4-Pro are around 862 GB of GPU memory in the FP4+FP8 mixed-precision instruct checkpoint. That puts Pro out of reach for almost everyone except labs with multi-node H100/H200 fleets, or AWS p5/p5e equivalents. A p5.48xlarge runs around $98/hour on-demand, dropping to roughly $1,400/day with reserved instances; the break-even point against the API arrives at roughly 200M+ tokens per day with reserved capacity, or earlier when data sovereignty makes the API a non-option.

BF16 / full-precision is a different problem

If you intend to run the model at full precision rather than DeepSeek’s native FP4+FP8 mix, plan for vastly more memory. For full precision with all experts resident, plan for 16–24× 80 GB GPUs (H100 or A100) to keep headroom for KV cache, activation buffers and real batch sizes. For most teams this is the moment to stop and price out the API instead. Our DeepSeek API pricing page lays out the per-token rates.

Worked example: API vs self-host break-even

A worked example for deepseek-v4-flash at 1,000,000 calls per day with a 2,000-token cached system prompt, 200-token user message and 300-token response:

Bucket Tokens Rate (per 1M) Cost
Cached input 2,000,000,000 $0.028 $56.00
Uncached input 200,000,000 $0.14 $28.00
Output 300,000,000 $0.28 $84.00
Total $168.00

Same workload at deepseek-v4-pro rates ($0.145 cache hit / $1.74 miss / $3.48 output) totals $1,682.00 per day. Notice that even a “cached” workload still pays the uncached miss rate on each new user message — context caching is a prefix optimisation, not a free pass.

For a full breakdown by tier, see our DeepSeek API pricing reference, or run your own numbers in the DeepSeek pricing calculator.

Recommendation by user profile

  • Casual user / researcher. Web or app. Zero hardware investment. Switch DeepThink on for harder questions.
  • Indie developer. API. A $5 prepay covers weeks of testing on V4-Flash. No GPUs to babysit.
  • Privacy-sensitive solo developer. R1-Distill-Qwen-32B on a single RTX 4090 or used 3090 via Ollama. Good reasoning, fully offline, no per-token bill.
  • Small team, regulated industry. V4-Flash on a single H200 (or 2× A100 80 GB) under vLLM. Real upfront cost, but you own the stack.
  • Frontier-tier production. Stay on the API for V4-Pro until your daily volume justifies a multi-node cluster.

For deeper guidance on choosing between hosted and local, see our DeepSeek free vs paid comparison and the broader DeepSeek beginner guides hub.

Last verified: 2026-04-25. DeepSeek AI Guide is an independent resource and is not affiliated with DeepSeek or its parent company. Model IDs, pricing and API behaviour change; check the official DeepSeek documentation and pricing page before committing to a production decision.

Frequently asked questions

What are the minimum specs to run DeepSeek locally?

The smallest practical local option is the DeepSeek-R1-Distill-Qwen-1.5B model, which runs in under 1 GB of VRAM and can even use CPU only on a modern 8-core processor with 16 GB of RAM. For a usable coding assistant experience, target 16 GB of VRAM (the 14B distill) or 24 GB (the 32B distill, which most users prefer). See our Ollama setup guide for the install steps.

Can I run DeepSeek V4 on a single RTX 4090?

Not the full V4-Flash or V4-Pro weights at production quality. V4-Flash in FP4+FP8 is about 158 GB and fits on a single H200 or 2× A100 80 GB; with INT4 quantization it can potentially fit on 4× RTX 4090s with quality trade-offs. A single 4090 (24 GB) is the right card for the R1-Distill 32B model instead. Compare options in our DeepSeek R1 Distill overview.

Does the DeepSeek app need a powerful phone?

No. The official mobile apps are thin clients — model inference happens on DeepSeek’s servers. Any phone running iOS 14 or Android 8.0 and later, with around 150 MB of free storage and a stable internet connection, will run them comfortably. For platform-specific notes, see DeepSeek on iPhone and DeepSeek on Android.

How much VRAM does the full DeepSeek-R1 671B model need?

The full 671B model needs about 376 GB of VRAM at Q4 quantization, requiring multiple GPUs or a high-memory Mac with CPU offloading. Even at 4-bit quantization, full R1 typically needs roughly 350 GB+ of VRAM, usually a cluster of 4× or 8× A100/H100 GPUs. Most users get better value from the 32B distill on a single 24 GB card. Our DeepSeek R1 page covers the trade-offs.

Is the DeepSeek API stateful or stateless?

The API is stateless. The POST /chat/completions endpoint does not remember prior turns; your client must resend the full messages array on every request. This contrasts with the web chat and mobile app, which keep session history for the user. The API is OpenAI-compatible (and now Anthropic-compatible) at https://api.deepseek.com. Walk through the basics in our API getting-started tutorial.

Leave a Reply

Your email address will not be published. Required fields are marked *