Working with the DeepSeek API SDK: OpenAI and Anthropic Clients

API·April 25, 2026·By DS Guide Editorial

You want to call DeepSeek V4 from your own code, and you want to know whether you need a bespoke client library or whether the SDKs you already trust will work. Short answer: there is no first-party “DeepSeek SDK.” The DeepSeek API SDK story is that DeepSeek deliberately mirrors two existing wire formats — OpenAI’s Chat Completions and Anthropic’s Messages — so the official `openai` and `anthropic` packages, plus most third-party wrappers (LangChain, LlamaIndex, the Vercel AI SDK), work by changing only `base_url` and `api_key`. This guide walks through the two compatible surfaces, the V4 model IDs you should be using as of April 2026, a working Python quickstart, the parameters that matter, two costed worked examples, and the error patterns that will bite you in production.

What “DeepSeek API SDK” actually means

DeepSeek does not publish a branded SDK with its own package name. Instead, the platform exposes two HTTP surfaces that match formats developers already use: OpenAI Chat Completions and Anthropic Messages. The DeepSeek API uses an API format compatible with OpenAI/Anthropic, and by modifying the configuration you can use the OpenAI/Anthropic SDK or software compatible with the OpenAI/Anthropic API to access the DeepSeek API. That means the “SDK” you install is simply the upstream client from OpenAI or Anthropic, pointed at DeepSeek’s base URL with a DeepSeek API key.

Two practical consequences:

If your codebase already calls OpenAI or Anthropic, switching to DeepSeek is a two-line change — base_url and api_key.
Tooling built on those SDKs — LangChain, LlamaIndex, DSPy, the Vercel AI SDK, evaluation harnesses such as Promptfoo and Inspect — works without code changes. The official openai SDK works with a base-URL override, and every existing OpenAI-compatible wrapper, including LangChain, LlamaIndex, and DSPy, also works.

The current generation is DeepSeek V4, released April 24, 2026, shipped as two open-weight Mixture-of-Experts models under the MIT license: DeepSeek V4-Pro (1.6T total / 49B active parameters, frontier tier) and DeepSeek V4-Flash (284B / 13B active, cost-efficient). Both ship with a default 1,000,000-token context window and up to 384,000 tokens of output. Thinking mode is a request parameter, not a separate model ID.

The two base URLs and which SDK to install

Two endpoints, two request shapes — pick the one that matches the SDK ecosystem you already use.

Surface	Base URL	SDK to install	Endpoint path
OpenAI-compatible	`https://api.deepseek.com` (or `/v1`)	`pip install openai` / `npm install openai`	`POST /chat/completions`
Anthropic-compatible	`https://api.deepseek.com/anthropic`	`pip install anthropic`	`POST /v1/messages`
Beta (FIM, prefix completion)	`https://api.deepseek.com/beta`	OpenAI SDK	`POST /chat/completions`

Chat requests hit POST /chat/completions, the OpenAI-compatible endpoint. DeepSeek added support for the Anthropic API format with the base_url being https://api.deepseek.com/anthropic. Both surfaces accept the same API key. Use whichever matches your existing code; do not write a wrapper layer that hits both unless you have a specific portability reason.

Quickstart: Python with the OpenAI SDK

The minimum viable call against deepseek-v4-flash in non-thinking mode. Save your key as DEEPSEEK_API_KEY, then run the following Python snippet:

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

resp = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "system", "content": "You are a concise technical assistant."},
        {"role": "user", "content": "Summarise the OSI model in three bullets."},
    ],
    temperature=1.0,
    max_tokens=400,
)
print(resp.choices[0].message.content)

To enable thinking mode on either V4 model, add reasoning_effort="high" and the thinking flag. The DeepSeek API documentation describes thinking-max as the most intensive setting; it requires a 384K-token output budget. thinking_mode is the biggest cost lever: non-thinking skips the reasoning trace entirely and returns tokens at roughly V3.2 speed, thinking enables a reasoning block that costs extra tokens but improves accuracy on code and math, and thinking_max produces the scores in DeepSeek’s headline table — it also burns the most tokens and is the only mode that requires a 384K+ context budget.

resp = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[{"role": "user", "content": "Plan a 5-step migration from V3.2 to V4."}],
    reasoning_effort="high",
    extra_body={"thinking": {"type": "enabled"}},
)
print(resp.choices[0].message.reasoning_content)  # the reasoning trace
print(resp.choices[0].message.content)            # the final answer

When thinking is enabled the API returns reasoning_content alongside the final content. Do not feed reasoning_content back into messages on the next turn — DeepSeek’s API rejects it.

The same call in curl

curl https://api.deepseek.com/chat/completions 
  -H "Content-Type: application/json" 
  -H "Authorization: Bearer ${DEEPSEEK_API_KEY}" 
  -d '{
    "model": "deepseek-v4-pro",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello!"}
    ],
    "thinking": {"type": "enabled"},
    "reasoning_effort": "high",
    "stream": false
  }'

Quickstart: Python with the Anthropic SDK

If your codebase is already on Anthropic’s messages.create shape, install the anthropic package and override two environment variables. Configure ANTHROPIC_BASE_URL=https://api.deepseek.com/anthropic and ANTHROPIC_API_KEY, then call client.messages.create with model=”deepseek-v4-pro”.

import anthropic

client = anthropic.Anthropic(
    base_url="https://api.deepseek.com/anthropic",
    api_key="...your DeepSeek key...",
)

message = client.messages.create(
    model="deepseek-v4-pro",
    max_tokens=1024,
    system="You are a helpful assistant.",
    messages=[{"role": "user", "content": "Hi, how are you?"}],
)
print(message.content)

Two caveats specific to the Anthropic surface. First, parity is not 1:1: DeepSeek says the endpoint ignores anthropic-beta and anthropic-version, ignores fields such as container, mcp_servers, metadata, service_tier, top_k, and several cache_control fields, supports thinking but ignores budget_tokens, and ignores disable_parallel_tool_use in tool_choice. Second, when you pass an unsupported model name to DeepSeek’s Anthropic API, the API backend will automatically map it to the deepseek-v4-flash model. If you want V4-Pro, set the model ID explicitly.

Legacy IDs and the migration window

Older code that hard-codes deepseek-chat or deepseek-reasoner still runs, but on a clock. The DeepSeek API now supports V4-Pro and V4-Flash via both the OpenAI ChatCompletions interface and the Anthropic interface; to access them the base_url remains unchanged and the model parameter should be set to deepseek-v4-pro or deepseek-v4-flash. The two legacy API model names, deepseek-chat and deepseek-reasoner, will be discontinued on 2026-07-24, and during the current period they point to the non-thinking and thinking modes of deepseek-v4-flash respectively. The hard cutover is 2026-07-24 at 15:59 UTC; after that, requests using legacy IDs will fail.

Migration is a one-line model= change. The base URL does not move. If you maintain examples, dashboards, or pricing calculators that still reference legacy IDs, swap them now — see the DeepSeek API documentation for the canonical mapping.

Parameters that matter

The OpenAI surface accepts the parameters you already know. A short tour of the ones you will set most often:

temperature — 0 deterministic, 2 most varied. DeepSeek’s recommended defaults: 0.0 for code and maths, 1.0 for data analysis, 1.3 for general conversation and translation, 1.5 for creative writing.
top_p — nucleus sampling. Set this or temperature, not both.
max_tokens — output cap. V4 supports up to 384,000. Always set it explicitly when using JSON mode.
reasoning_effort — V4-only; accepts "high" or "max". Pair with extra_body={"thinking": {"type": "enabled"}}.
stream — when true, you receive server-sent-events. With thinking enabled, reasoning_content streams alongside the final answer. See DeepSeek API streaming for chunk handling.
response_format — {"type": "json_object"} turns on JSON mode (covered below).
tools — function declarations in OpenAI format. Supported in both thinking and non-thinking modes.

The DeepSeek API function calling guide has the full schema; for cache-key behaviour and the prefix-detection rules see DeepSeek context caching.

JSON mode — designed, not guaranteed

Set response_format={"type": "json_object"} and the model is designed to return valid JSON. Three caveats every production caller must handle:

The API may occasionally return empty content. Retry with a slightly varied prompt.
The prompt itself must include the literal word “json” and a small example schema.
Set max_tokens high enough that the model cannot truncate mid-object.

“Guaranteed valid JSON” is not the right mental model. Treat the response as needing a json.loads inside a try/except. The longer-form treatment is in DeepSeek API JSON mode.

Statelessness — the most common surprise

The web chat at chat.deepseek.com keeps your conversation history; the API does not. POST /chat/completions is stateless — the server does not remember prior turns. Your client must resend the full messages array on every request to maintain a multi-turn conversation. New developers coming from a chat-app mental model lose hours to this. If you need persistence, build it on your side (a database keyed by conversation ID is fine).

Worked example: cost per million calls

Two scenarios with a 2,000-token system prompt cached across calls, a 200-token user message that lands as a cache miss each time, and a 300-token response. Pricing is current as of April 2026 — verify against the DeepSeek API pricing page before committing budget.

Example A — `deepseek-v4-flash`

Bucket	Tokens	Rate (per 1M)	Cost
Input, cache hit	2,000,000,000	$0.028	$56.00
Input, cache miss	200,000,000	$0.140	$28.00
Output	300,000,000	$0.280	$84.00
Total			$168.00

Example B — `deepseek-v4-pro`

Bucket	Tokens	Rate (per 1M)	Cost
Input, cache hit	2,000,000,000	$0.145	$290.00
Input, cache miss	200,000,000	$1.740	$348.00
Output	300,000,000	$3.480	$1,044.00
Total			$1,682.00

Pro costs roughly ten times Flash on this workload. The user message is a cache miss on every call even when the system prompt hits — do not omit that line from your spreadsheets. For interactive estimates use the DeepSeek pricing calculator.

Error handling patterns

The OpenAI SDK raises typed exceptions you can catch. The codes you will see in production:

Status	Meaning	Action
401	Invalid API key	Verify the key, scope, and that you are hitting the right base URL
402	Insufficient balance	Top up; calls fail until the account is funded
422	Invalid parameters	Check the request body — usually a malformed `messages` array or an unsupported field
429	Rate limited	Exponential backoff; DeepSeek tunes limits dynamically by traffic
500 / 503	Server / overload	Retry with jitter; check the status page during incidents

A minimal retry pattern in Python:

import time
from openai import OpenAI, APIStatusError

def call_with_retry(client, **kwargs):
    for attempt in range(5):
        try:
            return client.chat.completions.create(**kwargs)
        except APIStatusError as e:
            if e.status_code in (429, 500, 503) and attempt < 4:
                time.sleep(2 ** attempt)
                continue
            raise

Log the request ID returned in the response headers — it is the fastest way to file a useful support report. The full taxonomy is in DeepSeek API error codes.

Third-party SDK options worth knowing

If you want a layer above the raw OpenAI client, two are in active maintenance:

Vercel AI SDK — the DeepSeek provider is available via the @ai-sdk/deepseek module, and the default prefix is https://api.deepseek.com. Idiomatic for Next.js and Edge runtimes.
LangChain / LlamaIndex — point their OpenAI-compatible adapters at DeepSeek’s base URL. No DeepSeek-specific package needed.

For local hacking, the community-maintained deepseek-cli Python package wraps the API for terminal use — useful for one-off prompts but not a production dependency.

Authentication and key hygiene

Authentication is a bearer token on the standard Authorization header. Treat the key like a password: never commit it, scope keys per project, and rotate on departure. The DeepSeek API authentication guide covers header format and rotation strategies; if you are starting from zero, the get a DeepSeek API key walkthrough is the shortest path. For the broader catalogue of API-side topics, the DeepSeek API docs and guides hub indexes every reference and tutorial.

What about the legacy off-peak discount?

The 50%/75% off-peak API discount was discontinued on September 5, 2025 and was not reintroduced with V4. If a tutorial cites it as active, the tutorial is out of date.

Last verified: 2026-04-25. DeepSeek AI Guide is an independent resource and is not affiliated with DeepSeek or its parent company. Model IDs, pricing and API behaviour change; check the official DeepSeek documentation and pricing page before committing to a production decision.

Does DeepSeek have an official SDK?

Not as a standalone branded package. DeepSeek publishes an OpenAI-compatible and an Anthropic-compatible API surface, so you install the upstream openai or anthropic SDK and change base_url plus api_key. That is the recommended path for both Python and Node. Third-party wrappers like LangChain and the Vercel AI SDK also work without modification. See DeepSeek OpenAI SDK compatibility.

How do I migrate from deepseek-chat to deepseek-v4-flash?

Change one line: the model parameter. The base URL, endpoint path, authentication header, and request shape are unchanged. Legacy IDs deepseek-chat and deepseek-reasoner currently route to deepseek-v4-flash and will be retired on 2026-07-24 at 15:59 UTC, so do the swap before then. Full reference in the DeepSeek API documentation.

Can I use the Anthropic Python SDK with DeepSeek?

Yes. Install anthropic, set ANTHROPIC_BASE_URL=https://api.deepseek.com/anthropic, and use a DeepSeek API key as ANTHROPIC_API_KEY. Pass model="deepseek-v4-pro" or "deepseek-v4-flash". Some Anthropic-specific fields (top_k, mcp_servers, budget_tokens) are ignored on this surface, so test feature parity before porting. More on the DeepSeek API code examples page.

What is the difference between the API and the DeepSeek chat app?

The API is stateless — your client resends conversation history on every call. The web chat and mobile app keep session history server-side, so users see past conversations across turns. They share the same underlying models, but the surfaces are independent. If you want a hosted UX without writing a frontend, the DeepSeek chat guide covers the app side.

How do I count tokens before I send a request?

Use a standard OpenAI-compatible tokenizer for an estimate, then trust the usage block in the response for the exact count. For budgeting in advance, the DeepSeek token counter gives you a fast pre-flight check, and the DeepSeek cost estimator projects monthly spend across V4-Flash and V4-Pro at your expected traffic.

Working with the DeepSeek API SDK: OpenAI and Anthropic Clients

What “DeepSeek API SDK” actually means

The two base URLs and which SDK to install

Quickstart: Python with the OpenAI SDK

The same call in curl

Quickstart: Python with the Anthropic SDK

Legacy IDs and the migration window

Parameters that matter

JSON mode — designed, not guaranteed

Statelessness — the most common surprise

Worked example: cost per million calls

Example A — deepseek-v4-flash

Example B — deepseek-v4-pro

Error handling patterns

Third-party SDK options worth knowing

Authentication and key hygiene

What about the legacy off-peak discount?

Does DeepSeek have an official SDK?

How do I migrate from deepseek-chat to deepseek-v4-flash?

Can I use the Anthropic Python SDK with DeepSeek?

What is the difference between the API and the DeepSeek chat app?

How do I count tokens before I send a request?

Related articles

Understanding DeepSeek API Rate Limits in Production

How DeepSeek API Authentication Works (and How to Do It Right)

DeepSeek API Streaming: SSE on V4-Pro and V4-Flash

DeepSeek API Function Calling on V4: A Practical Guide

How DeepSeek API JSON Mode Works on V4 (With Code)

How DeepSeek API Context Caching Cuts Input Costs by 80–90%

Leave a ReplyCancel Reply

Example A — `deepseek-v4-flash`

Example B — `deepseek-v4-pro`