How to Use DeepSeek in 2026: Chat, App and API Walkthrough

Tutorials·April 24, 2026·By DS Guide Editorial

So you have heard about DeepSeek and want to actually try it — not read another press release. This guide walks through how to use DeepSeek from three angles: the free web chat at chat.deepseek.com, the iOS and Android apps, and the developer API that now powers both. I’ve run DeepSeek V4 in production since the Preview dropped, and before that R1, V3 and V3.2, so the steps below reflect what actually works on April 24, 2026 — not what the marketing pages claim. By the end you will have signed in to the chatbot, switched between thinking and non-thinking modes, sent your first API request with the OpenAI SDK, and costed out a realistic workload against both V4 tiers.

What DeepSeek is, in one paragraph

DeepSeek is a Hangzhou-based AI lab that ships open-weight mixture-of-experts language models plus a consumer chatbot and developer API on top of them. The current generation, released on April 24, 2026, is DeepSeek V4 (Preview), shipped as two model IDs: deepseek-v4-pro (1.6T total parameters, 49B active per token) and deepseek-v4-flash (284B total, 13B active). Both are open-weight MoE models under the MIT license, both default to a 1,000,000-token context window with output up to 384,000 tokens, and both are reachable from a single OpenAI-compatible endpoint. If you want the background first, the what is DeepSeek primer covers the lab, the funding and the model family.

Three ways to use DeepSeek

DeepSeek gives you three entry points. Pick the one that matches your goal before you create an account — the sign-up flow is the same, but the interfaces behave very differently.

Web chat at chat.deepseek.com — browser, keeps conversation history, best for one-off research, drafting and reasoning.
Mobile app on iOS and Android — same feature set as the web, with voice input and push sync.
Developer API at https://api.deepseek.com — stateless, pay-per-token, for automation and products.

Web chat vs API: the one difference that trips people up

The web chat and mobile app keep your conversation history server-side for your account; you can close the tab, come back tomorrow, and pick up where you left off. The API does not. The POST /chat/completions endpoint is stateless — your client has to resend the full message array on every request to sustain a multi-turn conversation. Confusing the two is the single most common beginner mistake.

Step 1: Create an account

Go to chat.deepseek.com and sign up with an email address, Google account or phone number. If you are on mobile, install the official app from the App Store or Play Store first — third-party impostors exist, and our verify official DeepSeek app checklist shows how to confirm the publisher. For a complete walkthrough with screenshots, see the DeepSeek account setup guide.

Step 2: Use the chatbot (with thinking mode)

Once signed in, you land in a chat window. DeepSeek V4 is now the default model on both the web and the app — there is no model picker. What you do have is the DeepThink toggle above the input box. Leave it off for quick answers; switch it on when the task benefits from deliberate reasoning (multi-step math, code debugging, planning). Behind the scenes, DeepThink flips V4 between non-thinking and thinking mode; the same switch is exposed as a parameter in the API.

DeepSeek web chat interface with DeepThink toggle enabled — The web chat keeps session history; the API does not.

When to toggle DeepThink on

Math, proofs, Olympiad-style problems.
Debugging code where the first guess is usually wrong.
Multi-document comparisons or long-context reasoning.
Any task where you would rather wait 20 seconds for a better answer than get a fast, shallow one.

Step 3: Get an API key

For anything beyond casual chat, the API is where DeepSeek earns its reputation. Open platform.deepseek.com, sign in with the same credentials, go to API Keys, and click Create new key. Copy the value immediately — DeepSeek shows it once. Load a small balance ($5 is plenty for experimentation) from the Billing tab. DeepSeek may offer a granted balance as a promotional credit that can expire, so check the billing console for current offers rather than assuming any free tier. Full setup details live in the get a DeepSeek API key walkthrough.

Step 4: Make your first API call

The DeepSeek API is wire-compatible with OpenAI’s Chat Completions format, so the official OpenAI Python SDK works unchanged — you just swap base_url and api_key. Chat requests hit POST /chat/completions, the OpenAI-compatible endpoint. DeepSeek also exposes an Anthropic-compatible surface at the same base URL, if you prefer the Anthropic SDK.

Here is a minimal Python example using deepseek-v4-flash, the cost-efficient tier most production workloads should start on:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.deepseek.com",
    api_key="sk-...",  # your key
)

resp = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "system", "content": "You are a concise assistant."},
        {"role": "user", "content": "Summarise the capital of France in one line."},
    ],
    temperature=1.3,
    max_tokens=200,
)

print(resp.choices[0].message.content)

The same request as a curl one-liner:

curl https://api.deepseek.com/chat/completions 
  -H "Authorization: Bearer $DEEPSEEK_API_KEY" 
  -H "Content-Type: application/json" 
  -d '{
    "model": "deepseek-v4-flash",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

If you are coming from Node, the pattern is identical — see the DeepSeek Node.js integration tutorial for the JavaScript version.

Enabling thinking mode in the API

Thinking mode in V4 is a parameter, not a separate model ID. You stay on deepseek-v4-pro or deepseek-v4-flash and add two fields:

resp = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[{"role": "user", "content": "Plan a migration from V3.2 to V4."}],
    reasoning_effort="high",
    extra_body={"thinking": {"type": "enabled"}},
)
# resp.choices[0].message.reasoning_content  -> the reasoning trace
# resp.choices[0].message.content            -> the final answer

With thinking enabled the response returns reasoning_content alongside the final content. For the heaviest tasks, set reasoning_effort="max" — just make sure max_tokens allows 384,000 tokens so the trace cannot be truncated.

Legacy model IDs and the migration window

If you are maintaining an older integration, the legacy IDs deepseek-chat and deepseek-reasoner still work today. They now route to deepseek-v4-flash in non-thinking and thinking mode respectively. Both will be retired on 2026-07-24 at 15:59 UTC, after which requests using those IDs will fail. Migrating is a one-line change: update the model= field. Your base_url does not change.

Step 5: Choose the right parameters

DeepSeek documents recommended temperature settings by task type. Matching them saves you a lot of trial and error.

Task	Temperature	Notes
Code generation, mathematics	0.0	Deterministic — same prompt, same answer.
Data analysis, data cleaning	1.0	Balanced; follows structured prompts well.
General conversation, translation	1.3	Default for most chatbots.
Creative writing, poetry	1.5	More varied word choice.

Other parameters worth knowing early:

top_p — nucleus sampling; use instead of, not alongside, aggressive temperature tuning.
max_tokens — cap output length. With V4 you can go up to 384,000.
stream=true — server-sent events for token-by-token UIs.
response_format={"type": "json_object"} — JSON mode (see caveats below).
tools — OpenAI-compatible function calling; works in both thinking and non-thinking modes.

A note on JSON mode

JSON mode is designed to return valid JSON, not guaranteed. Three rules keep it reliable: include the word “json” and a small example schema in the prompt, set max_tokens high enough that output cannot be truncated, and handle occasional empty content gracefully. Full patterns are in the DeepSeek API JSON mode reference.

Step 6: Work out what it will cost

DeepSeek’s pricing has three token buckets, not two, and skipping one makes estimates meaningless. As of April 2026, the rates per 1M tokens are:

Model	Cache hit (input)	Cache miss (input)	Output
`deepseek-v4-flash`	$0.028	$0.14	$0.28
`deepseek-v4-pro`	$0.145	$1.74	$3.48

V4-Pro is roughly 12× the output price of V4-Flash. Unless a benchmark specifically justifies the frontier tier, start on Flash. The off-peak discounts DeepSeek used to run ended on 2025-09-05 and have not returned.

Worked example: 1,000,000 chat calls on V4-Flash

Assume each call has a cached 2,000-token system prompt, a fresh 200-token user message, and a 300-token response.

Cached input: 2,000 × 1,000,000 × $0.028/M = $56.00
Uncached input: 200 × 1,000,000 × $0.14/M = $28.00
Output: 300 × 1,000,000 × $0.28/M = $84.00
Total: $168.00

The same workload on deepseek-v4-pro would be $290 + $348 + $1,044 = $1,682.00. Note that the user message on each call is an uncached miss against the cached prefix — a common mistake is to assume the whole input is cached once you see any cache hit. It isn’t. For repeated cost modelling, the DeepSeek pricing calculator lets you plug in your own token counts.

Step 7: Verify it worked

Three checks tell you the integration is healthy:

HTTP 200 response with non-empty choices[0].message.content.
Token counts in the usage object match what you expected (roughly 1 token per 0.75 English words).
If thinking mode is on, reasoning_content is populated and is noticeably longer than the final answer on hard prompts.

Common errors and fixes

Symptom	Likely cause	Fix
`401 Unauthorized`	Wrong or revoked key	Regenerate from the API Keys page; set it via env var, not in source.
`402 Insufficient Balance`	Account balance hit zero	Top up in Billing; granted balances can expire.
Model “forgets” previous turn	API is stateless	Resend the full `messages` array each request.
Empty JSON in JSON mode	Prompt missing “json” or schema example	Add both; raise `max_tokens`.
Truncated thinking output	`max_tokens` too low for `reasoning_effort="max"`	Set `max_tokens` to allow 384,000.
Legacy ID starts failing after July	2026-07-24 15:59 UTC retirement	Swap `deepseek-chat` → `deepseek-v4-flash`.

A longer list of fixes lives in DeepSeek troubleshooting and DeepSeek API error codes.

Where DeepSeek fits, where it doesn’t

DeepSeek V4 is strongest at coding, math, long-context reasoning and structured output — areas where the V4 Preview announcement reports deepseek-v4-pro at 80.6% on SWE-Bench Verified. It is weaker than the big US labs at some consumer polish: voice modes, native image generation, deep third-party plugin ecosystems. For frontier chat apps with a huge integration catalogue, DeepSeek vs ChatGPT lays out where each wins. For raw cost per token on an API workload, DeepSeek-V4-Flash is among the cheapest frontier-tier options as of April 2026 — but always compare against each provider’s current pricing page before committing.

Next steps

Once the basics work, the natural progressions are:

Move from ad-hoc scripts to a real app — the DeepSeek Python integration tutorial covers retries, streaming and logging.
Build a retrieval pipeline over your own documents with the DeepSeek RAG tutorial.
Run the open weights locally for privacy — start with running DeepSeek on Ollama.
Browse the full library of DeepSeek tutorials for platform-specific walkthroughs.

Last verified: 2026-04-24. DeepSeek AI Guide is an independent resource and is not affiliated with DeepSeek or its parent company. Model IDs, pricing and API behaviour change; check the official DeepSeek documentation and pricing page before committing to a production decision.

Frequently asked questions

Is DeepSeek free to use?

The web chat at chat.deepseek.com and the mobile apps are free, with no publicly documented daily message cap as of April 2026. The API is pay-per-token with no free tier by default — DeepSeek may offer a granted promotional balance on new accounts, but it can expire, so check the billing console. For a breakdown of what is and isn’t free, see is DeepSeek free.

How do I switch between thinking and non-thinking mode?

In the web chat or app, use the DeepThink toggle above the input box — on for thinking, off for fast responses. In the API, stay on the same V4 model ID and set reasoning_effort="high" plus extra_body={"thinking": {"type": "enabled"}}; omit both to stay in non-thinking mode. Full parameter reference is in the DeepSeek API documentation.

What is the difference between DeepSeek V4-Pro and V4-Flash?

V4-Pro has 1.6T total parameters (49B active) and targets frontier coding and agentic work. V4-Flash has 284B total (13B active) and costs roughly a twelfth of Pro on output tokens while handling most chat and assistant workloads comfortably. Both share the 1M-token context window and the same feature set. Detailed specs are on the DeepSeek V4 page.

Does the DeepSeek API remember previous messages?

No. The POST /chat/completions endpoint is stateless — DeepSeek does not retain conversation history server-side for API requests. Your client must resend the full messages array on every call. Only the web chat and mobile app keep session history for you. More detail is in the DeepSeek browser vs app guide.

Can I still use deepseek-chat and deepseek-reasoner?

Yes, until 2026-07-24 at 15:59 UTC. Both legacy IDs currently route to deepseek-v4-flash — deepseek-chat in non-thinking mode, deepseek-reasoner in thinking mode. After that timestamp the IDs will be retired and requests using them will fail. Migration is a one-line change to the model field; base_url stays the same. See the DeepSeek API best practices reference for migration patterns.