Running DeepSeek on Linux: Web, API and Local Options
You have a Linux box — Ubuntu, Fedora, Arch, Debian, take your pick — and you want DeepSeek working on it tonight. The good news: running DeepSeek on Linux is the path the project is best tuned for, whether you want the hosted chat in a browser, the V4 API from a Python script, or open weights running locally through Ollama or SGLang. The bad news: most tutorials online still point at DeepSeek-R1 from January 2025 and skip the V4 generation that landed on April 24, 2026.
This guide covers all three paths, with the exact commands, the current model IDs, honest hardware requirements, and a worked cost example so you can pick the route that fits your machine and budget.
The three ways to run DeepSeek on Linux
Before any commands, decide which surface you actually need. The three options behave very differently in terms of cost, latency, privacy and the hardware they demand.
| Path | Best for | Hardware | Cost | Privacy |
|---|---|---|---|---|
| Web chat (browser) | Quick questions, no setup | Any Linux with a modern browser | Free tier | Data processed by DeepSeek |
| V4 API (Python/curl) | Apps, scripts, automation | Any Linux with Python 3.9+ | Pay-per-token | Data sent to DeepSeek servers |
| Local (Ollama, SGLang) | Offline use, full data control | 16 GB RAM minimum; GPU strongly recommended | Electricity + hardware | Stays on your machine |
If you only need a chat window, skip to “Path 1.” If you are building software, “Path 2” is what you want. If you need offline operation or refuse to send prompts off-box, jump to “Path 3.”
Path 1: Use DeepSeek’s web app from any Linux desktop
The simplest option costs nothing and works on any distro. Open Firefox, Chromium, Brave or your preferred browser and go to chat.deepseek.com. Sign in with email or Google, and you get the V4 model by default — DeepSeek’s V4 release made V4 the default for the consumer chat on April 24, 2026.
The DeepThink toggle in the chat UI now switches V4 between non-thinking and thinking mode, rather than swapping models. There is no Linux desktop app from DeepSeek — the web client is the official surface, and any “DeepSeek for Linux” .deb or .AppImage you find on third-party sites is unofficial. If a packaged client matters to you, see our notes on verifying the official DeepSeek app.
What about Electron wrappers?
Several community projects wrap the web chat in an Electron or Tauri shell so it sits in your launcher. They work, but they are unaffiliated with DeepSeek, and the underlying model and rate limits are identical to the browser. There is no published daily message cap; assume usage is moderated dynamically and have an API fallback ready for heavier work.
Path 2: Call the DeepSeek V4 API from Linux
For any kind of automation — a CLI tool, a Slack bot, an internal RAG service — the API is the right answer. The DeepSeek API uses a format compatible with OpenAI/Anthropic, so by modifying the configuration you can use the OpenAI/Anthropic SDK or software compatible with the OpenAI/Anthropic API to access it. On Linux, that means a one-line change to any existing OpenAI client.
Step 1: Get an API key
Create an account at platform.deepseek.com, then go to “API Keys” and generate one. Export it in your shell so you do not paste secrets into scripts:
export DEEPSEEK_API_KEY="sk-..."
echo 'export DEEPSEEK_API_KEY="sk-..."' >> ~/.bashrc
For a deeper walk-through, see our guide to getting a DeepSeek API key.
Step 2: First request with curl
Chat requests hit POST /chat/completions, the OpenAI-compatible endpoint. Test from any Linux terminal:
curl https://api.deepseek.com/chat/completions
-H "Content-Type: application/json"
-H "Authorization: Bearer ${DEEPSEEK_API_KEY}"
-d '{
"model": "deepseek-v4-flash",
"messages": [
{"role": "user", "content": "Summarise the Linux kernel scheduler in 3 sentences."}
]
}'
Step 3: Python with the OpenAI SDK
Install the OpenAI SDK in a venv (the standard Linux pattern that avoids polluting the system Python):
python3 -m venv ~/ds-env
source ~/ds-env/bin/activate
pip install openai
Then a minimal Python script. Note the base_url override — that is the entire migration from OpenAI to DeepSeek:
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com",
)
response = client.chat.completions.create(
model="deepseek-v4-pro",
messages=[{"role": "user", "content": "Plan a Postgres migration."}],
reasoning_effort="high",
extra_body={"thinking": {"type": "enabled"}},
)
print(response.choices[0].message.content)
That request enables thinking mode, which returns reasoning_content alongside the final content. Drop both reasoning_effort and extra_body to stay in non-thinking mode (the default, and what you want for chat-style apps). For more patterns, see DeepSeek Python integration.
Model IDs and the legacy migration window
V4 ships as two open-weight MoE tiers under MIT, both with a 1,000,000-token default context window and output up to 384,000 tokens:
deepseek-v4-pro— 1.6T total / 49B active. Frontier tier for agents, complex coding, long-horizon reasoning.deepseek-v4-flash— 284B total / 13B active. Cost-efficient default for chat and standard workloads.
If your code still uses the legacy IDs, they continue to work for now. The model names deepseek-chat and deepseek-reasoner will be deprecated on 2026/07/24, and for compatibility they correspond to the non-thinking mode and thinking mode of deepseek-v4-flash, respectively. The exact retirement is 2026-07-24 at 15:59 UTC. Migrating is a one-line model= swap; base_url does not change. See DeepSeek OpenAI SDK compatibility for the full mapping.
Parameters worth knowing
temperature— DeepSeek’s official guidance: 0.0 for code and maths, 1.0 for data analysis, 1.3 for general chat and translation, 1.5 for creative writing.top_p— nucleus sampling; an alternative to temperature.max_tokens— output cap. With V4 you can set this up to 384,000.reasoning_effort— V4-only."high"or"max"; pair withextra_body={"thinking": {"type": "enabled"}}.stream=true— server-sent events for token-by-token output. Streaming and tool calling work in both thinking and non-thinking modes.- JSON mode —
response_format={"type": "json_object"}is designed to return valid JSON, not guaranteed. Always include the word “json” plus a small example schema in the prompt, and setmax_tokenshigh enough that the response cannot truncate.
The API is stateless — your Linux client must resend the conversation history with every request to maintain a multi-turn chat. The web app keeps that state for you; the API does not.
Cost example: a Linux-hosted support bot on V4-Flash
Say you are running a knowledge-base assistant from a small Ubuntu VM. One million calls per month, with a 2,000-token system prompt that benefits from context caching, a 200-token user message per call, and a 300-token reply. Using deepseek-v4-flash rates ($0.028 cache-hit / $0.14 cache-miss / $0.28 output per 1M tokens, as of April 2026):
Input, cache hit : 2,000 × 1,000,000 = 2,000,000,000 × $0.028/M = $56.00
Input, cache miss : 200 × 1,000,000 = 200,000,000 × $0.14/M = $28.00
Output : 300 × 1,000,000 = 300,000,000 × $0.28/M = $84.00
-------
Total : $168.00
Each new user message is a fresh cache miss against the cached system prefix — do not skip the uncached-input line. The same workload on deepseek-v4-pro ($0.145 / $1.74 / $3.48 per 1M tokens) costs $1,682. Pick Flash unless a benchmark lift on a specific task justifies roughly 7× the spend. Verify both rates against the DeepSeek API pricing page before committing — Preview pricing can change.
Path 3: Run DeepSeek locally on Linux
For air-gapped work, regulated data, or simply learning how MoE models behave, you can run DeepSeek’s open weights on your own Linux box. Two routes dominate: Ollama for ease, SGLang for production.
Honest hardware reality check
The full V4 models are not consumer-runnable. deepseek-v4-pro is 1.6T parameters; even at FP4 it needs a GPU server, not a workstation. What most people actually run locally are the DeepSeek R1 Distill variants — fine-tuned versions of open-source models like LLaMA and Qwen, trained on data generated by DeepSeek-R1, that inherit DeepSeek’s reasoning capabilities while being far more efficient to self-host.
| Model size | Min RAM | GPU VRAM (recommended) | Disk | Use |
|---|---|---|---|---|
| 1.5B distill | 8 GB | None / 4 GB | ~2 GB | Quick tests, low-end laptops |
| 7B / 8B distill | 16 GB | 8 GB | ~5 GB | General-purpose local chat |
| 14B distill | 32 GB | 12 GB | ~9 GB | Solid reasoning on a single GPU |
| 32B distill | 64 GB | 24 GB | ~20 GB | RTX 4090 / 3090 territory |
| 70B distill | 128 GB | 48 GB+ (or two GPUs) | ~40 GB | Workstation / small server |
| V4 / V4-Pro full | — | Multi-GPU H100/H200/Blackwell | ~700 GB+ | Dedicated inference server |
For a more granular view, see our DeepSeek hardware calculator.
Step-by-step: Ollama on Ubuntu, Fedora or Arch
- Install Ollama. One curl command, distro-agnostic:
curl -fsSL https://ollama.com/install.sh | shThis script downloads and installs the Ollama binary, sets up the necessary services, and adds Ollama to your system’s PATH.
- Verify the service.
ollama --version systemctl status ollamaIf the service is not active, start it with
sudo systemctl start ollama. - Pull a model. Pick the largest size your VRAM can hold:
ollama pull deepseek-r1:7b # or for stronger reasoning on a 24 GB GPU: ollama pull deepseek-r1:32b - Run an interactive session.
ollama run deepseek-r1:7bYou get a REPL — type a prompt, hit Enter. Exit with
/bye. - Hit it from code. Ollama exposes an OpenAI-compatible server at
http://localhost:11434, so the same Python snippet from Path 2 works against your local model withbase_url="http://localhost:11434/v1".
For the deeper version of this walk-through with troubleshooting, see running DeepSeek on Ollama and the broader install DeepSeek locally guide.
GPU acceleration on Linux
Ollama uses your GPU automatically once the drivers are present. On NVIDIA hardware, install the proprietary driver and CUDA toolkit — Ollama requires a compute capability of 5.0 or higher to enable GPU inference. On AMD, install ROCm and use a recent kernel. Verify with nvidia-smi (NVIDIA) or rocm-smi (AMD) that the model is actually on the GPU during inference; if you see CPU pegged at 100% and GPU idle, the driver chain is broken.
Production-grade serving with SGLang or vLLM
Ollama is for one user. For multi-tenant inference — a team behind an internal API — SGLang and vLLM are the right tools. DeepSeek-V4 is the next-generation Mixture-of-Experts model from DeepSeek, released 2026-04-24 under an MIT License, shipping as two Instruct repos plus matching Base repos, with the Instruct repos shipping FP4 MoE experts plus FP8 attention/dense in one mixed-precision checkpoint covering all GPUs that support FP4. SGLang publishes Docker images and recipe-based launch commands for V4 on Blackwell and H200 hardware. The Docker pattern is straightforward on Linux:
docker run --gpus all --shm-size 32g -p 30000:30000
-v ~/.cache/huggingface:/root/.cache/huggingface
--env "HF_TOKEN=<your-hf-token>" --ipc=host
lmsysorg/sglang:deepseek-v4-blackwell sglang serve [args]
For containerised setups generally, see DeepSeek Docker deployment.
Adding a web UI: Open WebUI on Linux
The Ollama CLI is fine for testing, but most teams want a chat UI. Open WebUI is the most common pairing. In a Python venv:
sudo apt install python3-venv # Debian/Ubuntu
python3 -m venv ~/open-webui-venv
source ~/open-webui-venv/bin/activate
pip install open-webui
open-webui serve
Then visit http://localhost:8080, sign in to create the local admin account, and your installed Ollama models appear in the model dropdown. To run it as a background service, create a small systemd unit that starts open-webui serve after Ollama. To use the same UI against the hosted API instead of local models, add a connection pointing at https://api.deepseek.com with your API key.
Choosing the right path
I run all three. Web chat for stray questions on a laptop. The V4 API from Python on Ubuntu servers for production work — the math at $168/month for a million calls is hard to argue with. Local R1 distill on a 4090 workstation when I am offline on a train or working with documents I will not send to a remote provider.
If you are picking one to start with, the V4 API is the highest-leverage option on Linux: zero hardware investment, a single base_url change for any existing OpenAI code, and the same model that powers the chat. Add Ollama later when you have a specific reason to keep data local.
Last verified: 2026-04-25. DeepSeek AI Guide is an independent resource and is not affiliated with DeepSeek or its parent company. Model IDs, pricing and API behaviour change; check the official DeepSeek documentation and pricing page before committing to a production decision.
Is there an official DeepSeek desktop app for Linux?
No. DeepSeek does not publish a native Linux desktop client. The official surfaces on Linux are the web chat at chat.deepseek.com and the API at https://api.deepseek.com. Third-party Electron wrappers exist but are unaffiliated; treat any unofficial .deb, .rpm or AppImage with caution. See our notes on how to verify the official DeepSeek app before installing anything packaged.
How do I install DeepSeek on Ubuntu without a GPU?
You have two CPU-only options. Use the hosted API from any Python script — that needs no local model at all. Or install Ollama with curl -fsSL https://ollama.com/install.sh | sh and pull the smallest distill, ollama pull deepseek-r1:1.5b. The 1.5B model runs on CPU with 8 GB of RAM, slowly but functionally. Larger sizes are not realistic without a GPU; full DeepSeek system requirements are in our hardware guide.
What hardware do I need to run DeepSeek V4 locally on Linux?
The full deepseek-v4-pro (1.6T parameters) and deepseek-v4-flash (284B) ship as FP4/FP8 mixed-precision checkpoints intended for multi-GPU inference servers — H100, H200 or Blackwell-class hardware with hundreds of GB of VRAM. On a single workstation, run an R1 distill instead. Use our DeepSeek hardware calculator to size a specific model against your machine.
Can I use the OpenAI Python SDK to call DeepSeek from Linux?
Yes — that is the recommended pattern. The DeepSeek API is OpenAI-compatible, so the official openai package works by setting base_url="https://api.deepseek.com" and your DeepSeek API key. An Anthropic-compatible endpoint is also available against the same base URL. Full details of the wire-level mapping live in our DeepSeek API documentation notes.
Why does my DeepSeek API call lose the conversation history on Linux?
Because the API is stateless — DeepSeek does not store prior turns server-side. Every POST /chat/completions request must include the full messages array with all previous user and assistant turns. The web chat maintains session history for you; the API delegates that to your client. See our DeepSeek API getting started guide for a working multi-turn example.
