Building DeepSeek API Webhooks: The Practical Guide
“How do I get DeepSeek to call my server when a generation finishes?” comes up the moment a developer moves past one-off scripts. The honest answer is the one most write-ups skip: DeepSeek API webhooks are not a documented, first-party feature of the DeepSeek platform as of April 25, 2026. The official surface is a synchronous POST /chat/completions endpoint with optional Server-Sent Events streaming — there’s no “register a callback URL and we’ll POST to you” service. That doesn’t mean you can’t build webhook-style behaviour; it means you build it yourself with a thin relay. This guide shows the patterns that actually work in production, with code, costs, and the trade-offs.
What “DeepSeek API webhooks” really means
A webhook, in the usual sense, is a URL you register with a service so it can POST back to you when something happens — a payment clears, a job finishes, a message arrives. Stripe, GitHub and Twilio all ship this pattern. DeepSeek does not. The DeepSeek API uses a format compatible with OpenAI/Anthropic — by modifying the configuration, you can use the OpenAI or Anthropic SDK or compatible software to access the DeepSeek API. Both of those upstream APIs are request/response with optional streaming, and DeepSeek mirrors that surface. There is no /webhooks endpoint, no callback registration, and no event bus you can subscribe to.
So when people search for “deepseek api webhooks”, they’re usually asking one of three different questions:
- Push delivery — “Can DeepSeek POST results to my server?” Not directly. You build a relay.
- Async jobs — “Can I fire-and-forget a long generation?” Yes, by combining streaming with your own queue.
- Inbound triggers — “Can a Stripe/GitHub/Slack webhook call DeepSeek?” Yes, and this is the easiest case.
This article covers all three. If you want the underlying surface first, the DeepSeek API documentation is a good prerequisite, and DeepSeek OpenAI SDK compatibility covers the wire format we’ll be using.
The DeepSeek API surface, briefly
Before we wire anything up, the baseline. DeepSeek’s current generation is DeepSeek V4, released April 24, 2026, shipped as two open-weight Mixture-of-Experts models under the MIT license: deepseek-v4-pro (1.6T total / 49B active parameters, frontier tier) and deepseek-v4-flash (284B / 13B active, cost-efficient tier). Both default to a 1,000,000-token context window with up to 384,000 output tokens, and both expose thinking mode through a request parameter rather than a separate model ID.
Chat requests hit POST /chat/completions, the OpenAI-compatible endpoint at https://api.deepseek.com. The API supports an OpenAI-compatible chat-completions format at https://api.deepseek.com and an Anthropic-compatible format at https://api.deepseek.com/anthropic. If you’re maintaining an older integration, the model names deepseek-chat and deepseek-reasoner will be deprecated on 2026/07/24 — for compatibility, they correspond to the non-thinking mode and thinking mode of deepseek-v4-flash, respectively. The exact retirement is 2026-07-24 at 15:59 UTC; migrating is a one-line model= swap, with no base_url change.
One property matters more than any other for webhook-style work: the API is stateless. The server keeps no session — every request must include the full conversation history. That’s the opposite of how the web/app behaves, where DeepSeek tracks history across turns for the user.
Minimal request
Here’s a minimal Python call using the OpenAI SDK pattern:
from openai import OpenAI
client = OpenAI(
base_url="https://api.deepseek.com",
api_key="YOUR_KEY",
)
resp = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[{"role": "user", "content": "Summarise this ticket: ..."}],
)
print(resp.choices[0].message.content)
If you’ve never made a call before, DeepSeek API getting started walks through the first request end-to-end, and how to get a DeepSeek API key covers the console.
Pattern 1: Inbound webhooks that trigger DeepSeek
This is the common case people misname. You already have a webhook from another service — Stripe, GitHub, Slack, Linear, Intercom — and you want DeepSeek to do something with the payload. DeepSeek isn’t the webhook; your handler is.
The shape:
- Third-party service POSTs to
https://yourapp.com/hooks/stripe. - Your handler verifies the signature.
- You call DeepSeek synchronously and act on the result (write to a database, post to Slack, etc.).
- You return
200 OKto the third-party service.
A FastAPI sketch in Python:
from fastapi import FastAPI, Request
from openai import OpenAI
app = FastAPI()
client = OpenAI(base_url="https://api.deepseek.com", api_key="...")
@app.post("/hooks/support")
async def support_hook(req: Request):
payload = await req.json()
# verify signature here
summary = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{"role": "system", "content": "Summarise support tickets in 2 lines."},
{"role": "user", "content": payload["ticket"]["body"]},
],
max_tokens=200,
)
# write summary to your DB / Slack / Linear
return {"ok": True}
Two warnings. First, most webhook senders expect a response within seconds. If your DeepSeek call might take 30 seconds or more (long thinking mode, large outputs), do not block the webhook handler — return 200 immediately, queue the job, and process it in a worker. Second, when you reach the concurrency limit the API immediately returns HTTP 429, and the exposed limit on each account is adjusted dynamically according to real-time traffic pressure and short-term historical usage. A burst of inbound webhooks can therefore produce a burst of 429s; queue and retry rather than failing the originating webhook.
Pattern 2: Outbound delivery via a streaming relay
This is what people usually mean when they ask for “DeepSeek API webhooks”. You want DeepSeek to POST partial or final results to a URL you control. Since DeepSeek doesn’t do this natively, you put a tiny relay between your client and the API.
The relay:
- Receives an async job from your application (with a
callback_url). - Calls DeepSeek with
stream=trueand consumes the SSE stream. - POSTs each delta — or just the final result — to the
callback_url.
The streaming primitive comes from DeepSeek itself. Set “stream”: true to receive data-only Server-Sent Events (SSE) as the model generates output. You consume it server-side, then forward whatever shape your downstream caller expects.
Minimal relay worker
This Python worker treats the queue message as the “trigger” and the callback_url as the webhook target:
import httpx
from openai import OpenAI
client = OpenAI(base_url="https://api.deepseek.com", api_key="...")
def run_job(job):
stream = client.chat.completions.create(
model="deepseek-v4-pro",
messages=job["messages"],
reasoning_effort="high",
extra_body={"thinking": {"type": "enabled"}},
stream=True,
)
final_text = []
for chunk in stream:
delta = chunk.choices[0].delta
if delta.content:
final_text.append(delta.content)
httpx.post(
job["callback_url"],
json={"job_id": job["id"], "content": "".join(final_text)},
headers={"X-Signature": sign(job["id"])},
timeout=10,
)
When thinking mode is enabled, the response returns reasoning_content alongside the final content. If your downstream consumer wants the trace, capture delta.reasoning_content in the loop alongside delta.content and ship both.
Designing the callback payload
Webhooks people consume well share a few traits. Match them:
- Stable schema. Versioned envelope:
{"version": "1", "event": "deepseek.completion", "data": {...}}. - Signed. HMAC-SHA256 over the raw body with a per-recipient secret, in an
X-Signatureheader. - Idempotent. Include a stable
job_idso receivers can dedupe on retry. - Retried with backoff. If the receiver responds non-2xx, retry with jitter for at least an hour.
- Correlated. Forward the DeepSeek
x-request-idresponse header so support tickets can be traced.
Pattern 3: Long-running jobs without a relay
If you can’t run a relay (serverless cold-start limits, no queue infrastructure), there is a thinner option: poll. Your client submits the job, your worker streams DeepSeek to completion and writes the result to a key-value store, and the client polls a /jobs/{id} endpoint until status=done. This is not webhooks — but it solves the same async-result problem and is what most teams without a relay end up with.
One operational note matters here. Under scheduling pressure, non-streaming requests may return empty lines while waiting, streaming requests may return keep-alive comments while waiting, and if inference has not started after 10 minutes the server closes the connection. A naive HTTP client will time out on a cold queue; your worker needs to handle reconnects and treat early closes as a retry, not a job failure.
Reference: parameters that matter for async work
| Parameter | Purpose | Notes for webhook-style use |
|---|---|---|
model |
Model tier | deepseek-v4-flash for chat/triage, deepseek-v4-pro for agentic or coding work. |
stream |
Enable SSE | Set true in the relay so partial deltas can be forwarded. |
reasoning_effort |
Thinking mode | "high" with extra_body={"thinking": {"type": "enabled"}}; "max" for max effort. |
temperature |
Randomness | 0.0 for code/math, 1.0 for data analysis, 1.3 for general chat, 1.5 for creative writing. |
max_tokens |
Output cap | Up to 384,000 on V4. Set generously when using JSON mode to prevent truncation. |
response_format |
JSON mode | Designed to return valid JSON, not guaranteed; prompt with the word “json” and an example schema, and handle occasional empty content. |
tools |
Function calling | Standard OpenAI shape; works in both thinking and non-thinking modes. |
For a fuller treatment of streams, see DeepSeek API streaming; for tool-call payload shape, DeepSeek API function calling.
Rate limits, retries, and 429 handling
Outbound webhook delivery is bursty by nature — one Stripe event can fan into ten DeepSeek calls. DeepSeek currently describes API rate limiting as a dynamic concurrency limit based on server load. The implication for relay design:
- Bound concurrency in the worker, not in the upstream caller. Cap your workers at a number you’ve measured, not a number you’ve guessed.
- Exponential backoff with jitter on 429 and 5xx — start at 1s, cap at 60s, give up after 10 attempts.
- Honour
Retry-Afterif present. - Surface the DeepSeek request id in your dead-letter queue so support requests can quote it.
The DeepSeek API rate limits page covers the details, and DeepSeek API error codes maps the status codes you’ll see most.
Cost worked example: webhook fan-out at scale
Suppose you run inbound support webhooks at a million events a month. Each event triggers one deepseek-v4-flash call with a 2,000-token system prompt (cached across calls), a 200-token user message (uncached on each call), and a 300-token response. As of April 2026, V4-Flash lists $0.028 cache-hit / $0.14 cache-miss / $0.28 output per 1M tokens — verify on the DeepSeek API pricing page before committing.
Cached input : 2,000 × 1,000,000 = 2,000,000,000 tokens × $0.028/M = $56.00
Uncached input : 200 × 1,000,000 = 200,000,000 tokens × $0.14/M = $28.00
Output : 300 × 1,000,000 = 300,000,000 tokens × $0.28/M = $84.00
-------
Total $168.00
The same workload on deepseek-v4-pro at $0.145 / $1.74 / $3.48 per 1M tokens lands at $1,682.00 — roughly 10× the spend. For triage-shaped webhook fan-out, V4-Flash is almost always the right tier; reserve V4-Pro for the cases where the benchmark lift justifies the bill. Do not skip the uncached-input line: each new user message is a miss against the cached prefix until the model has seen it before. DeepSeek Context Caching on Disk is enabled by default for all users — if later requests share an overlapping prefix the repeated prefix can count as a cache hit, which reduces effective input cost; what can hit the cache is repeated prefixes such as the same system prompt or repeated few-shot examples, and you can inspect usage.prompt_cache_hit_tokens and usage.prompt_cache_miss_tokens in the response.
Security checklist
- Verify inbound signatures on every third-party webhook before calling DeepSeek. Don’t burn tokens on forged events.
- Sign your outbound callbacks with HMAC-SHA256 over the raw body and document the verification recipe.
- Rotate API keys on any key that touched a public log or screenshot. Use the console’s separate keys for staging and production.
- Strip PII from the prompt where you can — the API ships requests to DeepSeek-operated infrastructure, and your privacy posture should match.
- Never expose your DeepSeek key in browser code. All calls go through your relay, which holds the key.
Where this fits
If you’re integrating DeepSeek into a Node.js backend, DeepSeek Node.js integration covers the same pattern in JavaScript. For Python-first stacks, DeepSeek Python integration is the companion guide. And if you’re still deciding which model tier to target, the DeepSeek API docs and guides hub gathers the rest of the reference.
Last verified: 2026-04-25. DeepSeek AI Guide is an independent resource and is not affiliated with DeepSeek or its parent company. Model IDs, pricing and API behaviour change; check the official DeepSeek documentation and pricing page before committing to a production decision.
Does DeepSeek support webhooks natively?
No. As of April 25, 2026, the official DeepSeek API exposes a synchronous POST /chat/completions endpoint with optional Server-Sent Events streaming, plus an Anthropic-compatible surface at the same base URL. There is no documented webhook registration, callback URL, or event subscription system. To get push delivery, you build a small relay that consumes the stream and POSTs to your own URL. See the DeepSeek API documentation for the full surface.
How can I receive an event when a DeepSeek generation finishes?
Run a worker that calls DeepSeek with stream=true, accumulates the deltas, and POSTs the final result to a callback URL of yours when the stream closes. This is the relay pattern from the body of this article. If polling is acceptable, you can also write the result to a key-value store and let the client poll a status endpoint. The DeepSeek API streaming guide covers the SSE consumer side.
Can a Stripe or GitHub webhook trigger DeepSeek?
Yes — this is the easiest case and what most people actually need. Receive the third-party webhook in your handler, verify its signature, then call POST /chat/completions on DeepSeek with the payload as input. Return 200 to the sender quickly; if the DeepSeek call is slow, queue it and process in a worker so you don’t time out the originating webhook. DeepSeek API getting started shows the basic call.
What is the rate limit for DeepSeek webhook fan-out?
DeepSeek does not publish a fixed per-key request-per-second cap. The platform applies a dynamic concurrency limit based on server load and per-account history; once you exceed it the API returns HTTP 429. For webhook fan-out, bound your worker concurrency, retry 429s with exponential backoff and jitter, and honour Retry-After. See DeepSeek API rate limits for current behaviour and DeepSeek API best practices for retry patterns.
Why does my long DeepSeek request hang or close early?
Under scheduling pressure, DeepSeek may emit empty lines (non-streaming) or SSE keep-alive comments (streaming) while waiting, and the server will close the connection if inference has not started after ten minutes. Configure explicit connect and read timeouts, treat early closes as retryable, and ensure your reverse proxy or serverless runtime does not kill long-running streamed responses. The DeepSeek API error codes reference covers the matching status semantics.
