How to Build a DeepSeek Streamlit App in Python

Build a DeepSeek Streamlit app with streaming chat, thinking mode and cost controls. Follow the step-by-step Python tutorial now.

How to Build a DeepSeek Streamlit App in Python

Tutorials·April 25, 2026·By DS Guide Editorial

You want a working chat UI in front of DeepSeek without spending a weekend wiring up React, websockets and a backend. A DeepSeek Streamlit app gets you there in roughly 80 lines of Python: a text input, streamed responses, a model picker for the V4 tiers, and a running token counter. That is the project this tutorial walks through end to end.

By the end you will have a single `app.py` that talks to `deepseek-v4-flash` for cheap default chat, switches to `deepseek-v4-pro` when you need frontier reasoning, toggles thinking mode, streams tokens as they arrive, and shows the live cost per conversation. The code runs locally and deploys to Streamlit Community Cloud unchanged.

What you’ll build

The finished app is a single-file Streamlit chat that talks to DeepSeek’s OpenAI-compatible API. It supports both V4 tiers, a thinking-mode toggle, server-sent-events streaming, conversation persistence within the session, and a sidebar that totals tokens and dollars as you chat. Chat requests hit POST /chat/completions, the OpenAI-compatible endpoint at https://api.deepseek.com, so the official OpenAI Python SDK works unchanged once you swap base_url and api_key.

Two design choices matter up front. First, the DeepSeek API is stateless — the server does not remember prior turns, so the client has to resend the full messages array every request. Streamlit’s st.session_state is exactly the right place to keep that history. Second, V4 ships as a family of two open-weight MoE models with shared features — deepseek-v4-pro (1.6T total / 49B active) for frontier work and deepseek-v4-flash (284B / 13B active) for cost-efficient chat — and thinking is a request parameter, not a separate model ID. We will expose both choices in the sidebar.

Screenshot placeholder: a teal-and-amber Streamlit chat panel showing a user message, a streamed assistant reply, and a sidebar with model selector, thinking toggle and live cost meter.

Prerequisites

  • Python 3.10 or newer with pip available on your PATH.
  • A DeepSeek API key. If you do not have one yet, follow the get a DeepSeek API key walkthrough before continuing.
  • A virtual environment tool — venv, uv or poetry. We will use venv.
  • About 5 minutes and roughly $0.01 of API credit to test end to end on V4-Flash.
  • Optional but recommended: a free Streamlit Community Cloud account for deployment.

If this is your first time hitting the API at all, run through DeepSeek API getting started first — it covers authentication and a minimal curl call before you wrap anything in a UI.

Step 1: Create the project and install dependencies

Open a terminal and create a clean project folder. The commands below are bash; PowerShell users can swap the activation line.

mkdir deepseek-streamlit && cd deepseek-streamlit
python -m venv .venv
source .venv/bin/activate   # Windows: .venvScriptsactivate
pip install streamlit openai tiktoken

Three packages, nothing else. streamlit is the UI framework, openai is the official SDK we point at DeepSeek’s base URL, and tiktoken gives us a reasonable client-side token estimate for the cost meter. (DeepSeek’s tokenizer is not byte-identical to OpenAI’s, so treat the count as an estimate within a few percent — or use the DeepSeek token counter for a sanity check before you commit pricing to a customer.)

Step 2: Store the API key safely

Streamlit reads secrets from .streamlit/secrets.toml locally and from the dashboard in production. Create the file:

mkdir -p .streamlit
cat > .streamlit/secrets.toml <<'EOF'
DEEPSEEK_API_KEY = "sk-your-key-here"
EOF
echo ".venv/" > .gitignore
echo ".streamlit/secrets.toml" >> .gitignore

The .gitignore entry matters. Committing secrets.toml to GitHub is the single most common way readers leak a key. If you prefer environment variables, export DEEPSEEK_API_KEY and the code below will read either source.

Step 3: Write the chat app

Create app.py with the following Python. It is roughly 90 lines and covers everything: model selection, thinking toggle, streaming, history, and live cost.

import os
import streamlit as st
from openai import OpenAI

# --- Pricing (USD per 1M tokens) — update from the official pricing page ---
PRICING = {
    "deepseek-v4-flash": {"in_miss": 0.14, "in_hit": 0.028, "out": 0.28},
    "deepseek-v4-pro":   {"in_miss": 1.74, "in_hit": 0.145, "out": 3.48},
}

st.set_page_config(page_title="DeepSeek Chat", page_icon="·", layout="centered")
st.title("DeepSeek Streamlit App")

# --- Sidebar: model, thinking, temperature, cost ---
with st.sidebar:
    model = st.selectbox("Model", list(PRICING.keys()), index=0)
    thinking = st.toggle("Thinking mode", value=False,
                         help="Enables reasoning_effort=high")
    temperature = st.slider("Temperature", 0.0, 2.0, 1.3, 0.1,
                            help="0.0 code/math, 1.0 analysis, 1.3 chat, 1.5 creative")
    max_tokens = st.number_input("Max output tokens", 256, 8192, 1024, 256)
    if st.button("Clear conversation"):
        st.session_state.messages = []
        st.session_state.cost = 0.0
        st.rerun()
    st.metric("Session cost (USD)", f"${st.session_state.get('cost', 0.0):.4f}")

# --- Client (OpenAI SDK pointed at DeepSeek) ---
api_key = st.secrets.get("DEEPSEEK_API_KEY") or os.getenv("DEEPSEEK_API_KEY")
if not api_key:
    st.error("Set DEEPSEEK_API_KEY in .streamlit/secrets.toml or your environment.")
    st.stop()
client = OpenAI(base_url="https://api.deepseek.com", api_key=api_key)

# --- State ---
if "messages" not in st.session_state:
    st.session_state.messages = []
if "cost" not in st.session_state:
    st.session_state.cost = 0.0

# --- Replay prior turns ---
for m in st.session_state.messages:
    with st.chat_message(m["role"]):
        st.markdown(m["content"])

# --- Handle new input ---
if prompt := st.chat_input("Ask DeepSeek anything"):
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):
        st.markdown(prompt)

    # Build the request — stateless, so we resend full history every time.
    kwargs = {
        "model": model,
        "messages": st.session_state.messages,
        "temperature": temperature,
        "max_tokens": max_tokens,
        "stream": True,
        "stream_options": {"include_usage": True},
    }
    if thinking:
        kwargs["reasoning_effort"] = "high"
        kwargs["extra_body"] = {"thinking": {"type": "enabled"}}

    with st.chat_message("assistant"):
        placeholder = st.empty()
        thinking_box = st.expander("Reasoning", expanded=False) if thinking else None
        full_text, reasoning_text, usage = "", "", None
        stream = client.chat.completions.create(**kwargs)
        for chunk in stream:
            if chunk.usage:
                usage = chunk.usage
            if not chunk.choices:
                continue
            delta = chunk.choices[0].delta
            if getattr(delta, "reasoning_content", None) and thinking_box:
                reasoning_text += delta.reasoning_content
                thinking_box.markdown(reasoning_text)
            if delta.content:
                full_text += delta.content
                placeholder.markdown(full_text + " ")
        placeholder.markdown(full_text)

    st.session_state.messages.append({"role": "assistant", "content": full_text})

    # --- Cost calculation: enumerate all three buckets ---
    if usage:
        rates = PRICING[model]
        cached = getattr(usage, "prompt_cache_hit_tokens", 0) or 0
        miss = usage.prompt_tokens - cached
        cost = (cached * rates["in_hit"]
                + miss * rates["in_miss"]
                + usage.completion_tokens * rates["out"]) / 1_000_000
        st.session_state.cost += cost
        st.rerun()

A few things worth flagging in that code. We pass stream_options={"include_usage": True} so DeepSeek returns a final usage chunk with prompt and completion token counts — that is what makes the live cost meter accurate rather than an estimate. The reasoning_content handling is the V4 thinking-mode contract: when thinking is enabled, the stream returns reasoning_content alongside the final content, and we render the trace in a collapsible expander so it does not dominate the UI.

Step 4: Run it locally

From the project folder:

streamlit run app.py

Streamlit prints a local URL (usually http://localhost:8501) and opens it in your browser. Type a question, watch the response stream in token by token, and check that the sidebar cost ticks up. With V4-Flash, a short conversation should cost a fraction of a cent.

Verify it worked

  1. Send “Reply with the single word: pong.” — you should see pong stream in.
  2. Switch the sidebar model to deepseek-v4-pro and ask a multi-step reasoning question. Confirm the response arrives and the session cost jumps by roughly a factor of ten compared to Flash for similar token counts.
  3. Toggle Thinking mode on and ask a planning question. The “Reasoning” expander should populate with the trace, and the final answer should follow in the main message body.
  4. Click Clear conversation and confirm both history and the cost meter reset.

Step 5: Cost worked example

Suppose your Streamlit app handles 1,000,000 chat calls per month against deepseek-v4-flash, with a 2,000-token system prompt that DeepSeek’s context cache picks up after the first call, a 200-token user message each turn, and a 300-token reply. The math, with all three buckets enumerated:

Bucket Tokens Rate (per 1M) Cost
Input, cache hit 2,000,000,000 $0.028 $56.00
Input, cache miss 200,000,000 $0.14 $28.00
Output 300,000,000 $0.28 $84.00
Total $168.00

Run the same workload on deepseek-v4-pro and the total climbs to roughly $1,682 — about ten times more — because Pro lists $0.145 cache-hit, $1.74 cache-miss and $3.48 output per 1M tokens. Pricing rates are as of April 2026; verify against the DeepSeek API pricing page before quoting them in production. For larger sweeps the DeepSeek pricing calculator is faster than rebuilding the table by hand.

Two things to note. Each new user message is a cache miss against the cached prefix, even when the system prompt is fully cached — that is why the uncached-input line is non-zero. And the off-peak discount that older tutorials reference ended on September 5, 2025; do not budget around it.

Step 6: Add useful extras

JSON mode for structured output

If you want the assistant to return structured data — a form extraction, a classification result — set response_format={"type": "json_object"} in the request kwargs. DeepSeek’s JSON mode is designed to return valid JSON, not guaranteed: it can occasionally return empty content, the prompt must include the word “json” plus a small example schema, and you should set max_tokens high enough to avoid truncation. Always wrap json.loads in a try/except and fall back gracefully.

Tool calling

The app above is text-only, but DeepSeek supports OpenAI-format function calling on both V4 tiers in either thinking or non-thinking mode. Pass a tools=[...] list and handle tool_calls deltas in the stream. For a fuller pattern, the DeepSeek Python integration tutorial covers tool dispatch end to end.

RAG over your own documents

To let users chat with their own files, layer a retrieval step before the chat call: embed documents, query a vector store, and inject the top chunks into the system message. The DeepSeek RAG tutorial walks through that pattern with the same OpenAI-SDK pointer at DeepSeek’s base URL.

Common errors and fixes

Symptom Likely cause Fix
AuthenticationError: 401 Missing or wrong API key Re-check .streamlit/secrets.toml; confirm the key starts with sk- and was copied without trailing whitespace.
Model not found for deepseek-chat Using a legacy ID after migration Switch to deepseek-v4-flash. Legacy IDs deepseek-chat and deepseek-reasoner currently route to V4-Flash but retire on 2026-07-24 at 15:59 UTC.
Streaming hangs, no tokens render Streamlit placeholder.markdown not flushing inside the loop Confirm you are calling placeholder.markdown(full_text + " ") inside the for-loop, not after it.
JSON mode returns empty content Prompt lacks the word “json” or max_tokens too low Add an example schema in the prompt; raise max_tokens; wrap parsing in try/except.
Cost meter stays at $0.00 Usage chunk not requested Pass stream_options={"include_usage": True}; check the final chunk has chunk.usage populated.
429 rate-limit errors under load Concurrent users on a low quota Add exponential backoff; review DeepSeek API rate limits.

Deploying to Streamlit Community Cloud

Push the project to a public or private GitHub repository (without secrets.toml), connect the repo on share.streamlit.io, point it at app.py, and paste your DEEPSEEK_API_KEY into the app’s Secrets panel in the same TOML format. The deploy takes about 90 seconds. Because the API is stateless and history lives in st.session_state, each user gets their own conversation without any backend work on your side — but note that session state evaporates on container restart. For persistent history you would add a database; for most internal tools that is overkill.

Why these defaults

The temperature slider defaults to 1.3 because that is DeepSeek’s documented setting for general conversation and translation. Drop it to 0.0 for code generation or maths, leave it at 1.0 for data analysis, and push to 1.5 for creative writing. The default model is V4-Flash because the cost gap between the tiers is roughly 7× on output, and Flash is plenty for the bulk of chat workloads — reach for DeepSeek V4-Pro when a benchmark or task quality genuinely demands the lift.

The 1,000,000-token context window is the same on both tiers, with output up to 384,000 tokens, so you can paste a whole codebase into a turn without architecture changes. Most chat UIs never come close to that ceiling; if yours does, lean on context caching and check token counts before sending.

Next steps

Last verified: 2026-04-25. DeepSeek AI Guide is an independent resource and is not affiliated with DeepSeek or its parent company. Model IDs, pricing and API behaviour change; check the official DeepSeek documentation and pricing page before committing to a production decision.

Frequently asked questions

How do I connect Streamlit to the DeepSeek API?

Use the official OpenAI Python SDK and point it at DeepSeek’s base URL: OpenAI(base_url="https://api.deepseek.com", api_key=...). Chat requests hit POST /chat/completions with the same wire format as OpenAI, so no call sites change. Store the key in .streamlit/secrets.toml and read it with st.secrets. The full setup is covered in the DeepSeek OpenAI SDK compatibility guide.

What does a DeepSeek Streamlit app cost to run?

On deepseek-v4-flash, expect roughly $0.14 per million input miss tokens, $0.028 cache-hit, and $0.28 output per million as of April 2026. A typical short chat costs a fraction of a cent. The V4-Pro tier is around 6–7× more on output. Always enumerate all three token buckets when budgeting; the DeepSeek API pricing page is the source of truth.

Can I enable thinking mode in a Streamlit chat?

Yes. In V4, thinking is a request parameter on either tier, not a separate model. Pass reasoning_effort="high" with extra_body={"thinking": {"type": "enabled"}}. The stream then returns reasoning_content alongside the final content, which you can render in a Streamlit expander. See DeepSeek V4 for the full parameter contract.

Why does my app forget previous messages between requests?

The DeepSeek API is stateless — the server does not remember prior turns, so the client must resend the full conversation in the messages array on every request. Streamlit’s st.session_state is the right place to hold history during a session. The web chat behaves differently because its UI persists state for you. More on the underlying API in the DeepSeek API documentation.

Is it safe to deploy a DeepSeek Streamlit app publicly?

Yes, with the usual caveats: keep the API key in Streamlit secrets and never in the repo, add per-user rate limiting if the URL is public, and review what data your users will send before deploying. DeepSeek processes requests on servers subject to Chinese law, so route sensitive data accordingly. The DeepSeek privacy guide covers the trade-offs.

Leave a Reply

Your email address will not be published. Required fields are marked *