Running DeepSeek on Linux: Web, API and Local Options

Run DeepSeek on Linux three ways — official web app, V4 API, or local Ollama. Compare hardware, cost and steps. Set up your stack today.

Running DeepSeek on Linux: Web, API and Local Options

Guides·April 25, 2026·By DS Guide Editorial

You have a Linux box — Ubuntu, Fedora, Arch, Debian, take your pick — and you want DeepSeek working on it tonight. The good news: running DeepSeek on Linux is the path the project is best tuned for, whether you want the hosted chat in a browser, the V4 API from a Python script, or open weights running locally through Ollama or SGLang. The bad news: most tutorials online still point at DeepSeek-R1 from January 2025 and skip the V4 generation that landed on April 24, 2026.

This guide covers all three paths, with the exact commands, the current model IDs, honest hardware requirements, and a worked cost example so you can pick the route that fits your machine and budget.

The three ways to run DeepSeek on Linux

Before any commands, decide which surface you actually need. The three options behave very differently in terms of cost, latency, privacy and the hardware they demand.

Path Best for Hardware Cost Privacy
Web chat (browser) Quick questions, no setup Any Linux with a modern browser Free tier Data processed by DeepSeek
V4 API (Python/curl) Apps, scripts, automation Any Linux with Python 3.9+ Pay-per-token Data sent to DeepSeek servers
Local (Ollama, SGLang) Offline use, full data control 16 GB RAM minimum; GPU strongly recommended Electricity + hardware Stays on your machine

If you only need a chat window, skip to “Path 1.” If you are building software, “Path 2” is what you want. If you need offline operation or refuse to send prompts off-box, jump to “Path 3.”

Path 1: Use DeepSeek’s web app from any Linux desktop

The simplest option costs nothing and works on any distro. Open Firefox, Chromium, Brave or your preferred browser and go to chat.deepseek.com. Sign in with email or Google, and you get the V4 model by default — DeepSeek’s V4 release made V4 the default for the consumer chat on April 24, 2026.

The DeepThink toggle in the chat UI now switches V4 between non-thinking and thinking mode, rather than swapping models. There is no Linux desktop app from DeepSeek — the web client is the official surface, and any “DeepSeek for Linux” .deb or .AppImage you find on third-party sites is unofficial. If a packaged client matters to you, see our notes on verifying the official DeepSeek app.

What about Electron wrappers?

Several community projects wrap the web chat in an Electron or Tauri shell so it sits in your launcher. They work, but they are unaffiliated with DeepSeek, and the underlying model and rate limits are identical to the browser. There is no published daily message cap; assume usage is moderated dynamically and have an API fallback ready for heavier work.

Path 2: Call the DeepSeek V4 API from Linux

For any kind of automation — a CLI tool, a Slack bot, an internal RAG service — the API is the right answer. The DeepSeek API uses a format compatible with OpenAI/Anthropic, so by modifying the configuration you can use the OpenAI/Anthropic SDK or software compatible with the OpenAI/Anthropic API to access it. On Linux, that means a one-line change to any existing OpenAI client.

Step 1: Get an API key

Create an account at platform.deepseek.com, then go to “API Keys” and generate one. Export it in your shell so you do not paste secrets into scripts:

export DEEPSEEK_API_KEY="sk-..."
echo 'export DEEPSEEK_API_KEY="sk-..."' >> ~/.bashrc

For a deeper walk-through, see our guide to getting a DeepSeek API key.

Step 2: First request with curl

Chat requests hit POST /chat/completions, the OpenAI-compatible endpoint. Test from any Linux terminal:

curl https://api.deepseek.com/chat/completions 
  -H "Content-Type: application/json" 
  -H "Authorization: Bearer ${DEEPSEEK_API_KEY}" 
  -d '{
    "model": "deepseek-v4-flash",
    "messages": [
      {"role": "user", "content": "Summarise the Linux kernel scheduler in 3 sentences."}
    ]
  }'

Step 3: Python with the OpenAI SDK

Install the OpenAI SDK in a venv (the standard Linux pattern that avoids polluting the system Python):

python3 -m venv ~/ds-env
source ~/ds-env/bin/activate
pip install openai

Then a minimal Python script. Note the base_url override — that is the entire migration from OpenAI to DeepSeek:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

response = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[{"role": "user", "content": "Plan a Postgres migration."}],
    reasoning_effort="high",
    extra_body={"thinking": {"type": "enabled"}},
)
print(response.choices[0].message.content)

That request enables thinking mode, which returns reasoning_content alongside the final content. Drop both reasoning_effort and extra_body to stay in non-thinking mode (the default, and what you want for chat-style apps). For more patterns, see DeepSeek Python integration.

Model IDs and the legacy migration window

V4 ships as two open-weight MoE tiers under MIT, both with a 1,000,000-token default context window and output up to 384,000 tokens:

  • deepseek-v4-pro — 1.6T total / 49B active. Frontier tier for agents, complex coding, long-horizon reasoning.
  • deepseek-v4-flash — 284B total / 13B active. Cost-efficient default for chat and standard workloads.

If your code still uses the legacy IDs, they continue to work for now. The model names deepseek-chat and deepseek-reasoner will be deprecated on 2026/07/24, and for compatibility they correspond to the non-thinking mode and thinking mode of deepseek-v4-flash, respectively. The exact retirement is 2026-07-24 at 15:59 UTC. Migrating is a one-line model= swap; base_url does not change. See DeepSeek OpenAI SDK compatibility for the full mapping.

Parameters worth knowing

  • temperature — DeepSeek’s official guidance: 0.0 for code and maths, 1.0 for data analysis, 1.3 for general chat and translation, 1.5 for creative writing.
  • top_p — nucleus sampling; an alternative to temperature.
  • max_tokens — output cap. With V4 you can set this up to 384,000.
  • reasoning_effort — V4-only. "high" or "max"; pair with extra_body={"thinking": {"type": "enabled"}}.
  • stream=true — server-sent events for token-by-token output. Streaming and tool calling work in both thinking and non-thinking modes.
  • JSON mode — response_format={"type": "json_object"} is designed to return valid JSON, not guaranteed. Always include the word “json” plus a small example schema in the prompt, and set max_tokens high enough that the response cannot truncate.

The API is stateless — your Linux client must resend the conversation history with every request to maintain a multi-turn chat. The web app keeps that state for you; the API does not.

Cost example: a Linux-hosted support bot on V4-Flash

Say you are running a knowledge-base assistant from a small Ubuntu VM. One million calls per month, with a 2,000-token system prompt that benefits from context caching, a 200-token user message per call, and a 300-token reply. Using deepseek-v4-flash rates ($0.028 cache-hit / $0.14 cache-miss / $0.28 output per 1M tokens, as of April 2026):

Input, cache hit  : 2,000 × 1,000,000 = 2,000,000,000 × $0.028/M = $56.00
Input, cache miss :   200 × 1,000,000 =   200,000,000 × $0.14/M  = $28.00
Output            :   300 × 1,000,000 =   300,000,000 × $0.28/M  = $84.00
                                                                  -------
Total             :                                                $168.00

Each new user message is a fresh cache miss against the cached system prefix — do not skip the uncached-input line. The same workload on deepseek-v4-pro ($0.145 / $1.74 / $3.48 per 1M tokens) costs $1,682. Pick Flash unless a benchmark lift on a specific task justifies roughly 7× the spend. Verify both rates against the DeepSeek API pricing page before committing — Preview pricing can change.

Path 3: Run DeepSeek locally on Linux

For air-gapped work, regulated data, or simply learning how MoE models behave, you can run DeepSeek’s open weights on your own Linux box. Two routes dominate: Ollama for ease, SGLang for production.

Honest hardware reality check

The full V4 models are not consumer-runnable. deepseek-v4-pro is 1.6T parameters; even at FP4 it needs a GPU server, not a workstation. What most people actually run locally are the DeepSeek R1 Distill variants — fine-tuned versions of open-source models like LLaMA and Qwen, trained on data generated by DeepSeek-R1, that inherit DeepSeek’s reasoning capabilities while being far more efficient to self-host.

Model size Min RAM GPU VRAM (recommended) Disk Use
1.5B distill 8 GB None / 4 GB ~2 GB Quick tests, low-end laptops
7B / 8B distill 16 GB 8 GB ~5 GB General-purpose local chat
14B distill 32 GB 12 GB ~9 GB Solid reasoning on a single GPU
32B distill 64 GB 24 GB ~20 GB RTX 4090 / 3090 territory
70B distill 128 GB 48 GB+ (or two GPUs) ~40 GB Workstation / small server
V4 / V4-Pro full Multi-GPU H100/H200/Blackwell ~700 GB+ Dedicated inference server

For a more granular view, see our DeepSeek hardware calculator.

Step-by-step: Ollama on Ubuntu, Fedora or Arch

  1. Install Ollama. One curl command, distro-agnostic:
    curl -fsSL https://ollama.com/install.sh | sh

    This script downloads and installs the Ollama binary, sets up the necessary services, and adds Ollama to your system’s PATH.

  2. Verify the service.
    ollama --version
    systemctl status ollama

    If the service is not active, start it with sudo systemctl start ollama.

  3. Pull a model. Pick the largest size your VRAM can hold:
    ollama pull deepseek-r1:7b
    # or for stronger reasoning on a 24 GB GPU:
    ollama pull deepseek-r1:32b
  4. Run an interactive session.
    ollama run deepseek-r1:7b

    You get a REPL — type a prompt, hit Enter. Exit with /bye.

  5. Hit it from code. Ollama exposes an OpenAI-compatible server at http://localhost:11434, so the same Python snippet from Path 2 works against your local model with base_url="http://localhost:11434/v1".

For the deeper version of this walk-through with troubleshooting, see running DeepSeek on Ollama and the broader install DeepSeek locally guide.

GPU acceleration on Linux

Ollama uses your GPU automatically once the drivers are present. On NVIDIA hardware, install the proprietary driver and CUDA toolkit — Ollama requires a compute capability of 5.0 or higher to enable GPU inference. On AMD, install ROCm and use a recent kernel. Verify with nvidia-smi (NVIDIA) or rocm-smi (AMD) that the model is actually on the GPU during inference; if you see CPU pegged at 100% and GPU idle, the driver chain is broken.

Production-grade serving with SGLang or vLLM

Ollama is for one user. For multi-tenant inference — a team behind an internal API — SGLang and vLLM are the right tools. DeepSeek-V4 is the next-generation Mixture-of-Experts model from DeepSeek, released 2026-04-24 under an MIT License, shipping as two Instruct repos plus matching Base repos, with the Instruct repos shipping FP4 MoE experts plus FP8 attention/dense in one mixed-precision checkpoint covering all GPUs that support FP4. SGLang publishes Docker images and recipe-based launch commands for V4 on Blackwell and H200 hardware. The Docker pattern is straightforward on Linux:

docker run --gpus all --shm-size 32g -p 30000:30000 
  -v ~/.cache/huggingface:/root/.cache/huggingface 
  --env "HF_TOKEN=<your-hf-token>" --ipc=host 
  lmsysorg/sglang:deepseek-v4-blackwell sglang serve [args]

For containerised setups generally, see DeepSeek Docker deployment.

Adding a web UI: Open WebUI on Linux

The Ollama CLI is fine for testing, but most teams want a chat UI. Open WebUI is the most common pairing. In a Python venv:

sudo apt install python3-venv     # Debian/Ubuntu
python3 -m venv ~/open-webui-venv
source ~/open-webui-venv/bin/activate
pip install open-webui
open-webui serve

Then visit http://localhost:8080, sign in to create the local admin account, and your installed Ollama models appear in the model dropdown. To run it as a background service, create a small systemd unit that starts open-webui serve after Ollama. To use the same UI against the hosted API instead of local models, add a connection pointing at https://api.deepseek.com with your API key.

Choosing the right path

I run all three. Web chat for stray questions on a laptop. The V4 API from Python on Ubuntu servers for production work — the math at $168/month for a million calls is hard to argue with. Local R1 distill on a 4090 workstation when I am offline on a train or working with documents I will not send to a remote provider.

If you are picking one to start with, the V4 API is the highest-leverage option on Linux: zero hardware investment, a single base_url change for any existing OpenAI code, and the same model that powers the chat. Add Ollama later when you have a specific reason to keep data local.

Last verified: 2026-04-25. DeepSeek AI Guide is an independent resource and is not affiliated with DeepSeek or its parent company. Model IDs, pricing and API behaviour change; check the official DeepSeek documentation and pricing page before committing to a production decision.

Is there an official DeepSeek desktop app for Linux?

No. DeepSeek does not publish a native Linux desktop client. The official surfaces on Linux are the web chat at chat.deepseek.com and the API at https://api.deepseek.com. Third-party Electron wrappers exist but are unaffiliated; treat any unofficial .deb, .rpm or AppImage with caution. See our notes on how to verify the official DeepSeek app before installing anything packaged.

How do I install DeepSeek on Ubuntu without a GPU?

You have two CPU-only options. Use the hosted API from any Python script — that needs no local model at all. Or install Ollama with curl -fsSL https://ollama.com/install.sh | sh and pull the smallest distill, ollama pull deepseek-r1:1.5b. The 1.5B model runs on CPU with 8 GB of RAM, slowly but functionally. Larger sizes are not realistic without a GPU; full DeepSeek system requirements are in our hardware guide.

What hardware do I need to run DeepSeek V4 locally on Linux?

The full deepseek-v4-pro (1.6T parameters) and deepseek-v4-flash (284B) ship as FP4/FP8 mixed-precision checkpoints intended for multi-GPU inference servers — H100, H200 or Blackwell-class hardware with hundreds of GB of VRAM. On a single workstation, run an R1 distill instead. Use our DeepSeek hardware calculator to size a specific model against your machine.

Can I use the OpenAI Python SDK to call DeepSeek from Linux?

Yes — that is the recommended pattern. The DeepSeek API is OpenAI-compatible, so the official openai package works by setting base_url="https://api.deepseek.com" and your DeepSeek API key. An Anthropic-compatible endpoint is also available against the same base URL. Full details of the wire-level mapping live in our DeepSeek API documentation notes.

Why does my DeepSeek API call lose the conversation history on Linux?

Because the API is stateless — DeepSeek does not store prior turns server-side. Every POST /chat/completions request must include the full messages array with all previous user and assistant turns. The web chat maintains session history for you; the API delegates that to your client. See our DeepSeek API getting started guide for a working multi-turn example.

Leave a Reply

Your email address will not be published. Required fields are marked *