How DeepSeek API JSON Mode Works on V4 (With Code)

API·April 25, 2026·By DS Guide Editorial

If you have ever shipped an LLM feature to production, you know the pain: the model returns prose where you needed a parseable object, your downstream code throws, and the on-call engineer is paged at 2 a.m. The DeepSeek API JSON mode is the fix — it forces `deepseek-v4-pro` and `deepseek-v4-flash` to emit a valid JSON string instead of free text, so you can pipe outputs straight into a typed parser.

This guide covers exactly how the feature behaves on V4 (released April 24, 2026), the three prompt rules that prevent empty responses, a worked cost example for both V4 tiers, error handling, and how it compares to tool calling for structured extraction. Every claim is backed by the official documentation or a tested code path.

What DeepSeek API JSON mode actually does

JSON mode is a request-level flag that constrains the model’s output to a valid JSON string. You enable it by setting response_format={"type": "json_object"} on a POST /chat/completions request. The official DeepSeek docs describe the feature as designed to ensure valid JSON output, and the parameter reference says setting to { “type”: “json_object” } enables JSON Output, which guarantees the message the model generates is valid JSON.

That word “guarantees” needs a caveat. In practice, two failure modes still happen: the model can return empty content if the prompt does not steer it correctly, and JSON can still be truncated if you set max_tokens too low. The official guide acknowledges the first failure mode directly: when using the JSON Output feature, the API may occasionally return empty content. We are actively working on optimizing this issue. You can try modifying the prompt to mitigate such problems. Treat JSON mode as a strong constraint, not a contract.

JSON mode works on both V4 model IDs — deepseek-v4-flash and deepseek-v4-pro — in either non-thinking (default) or thinking mode. It also works on the legacy deepseek-chat and deepseek-reasoner IDs, which currently route to deepseek-v4-flash and will be retired on 2026-07-24 at 15:59 UTC. If you have an older integration, swap the model value to a V4 ID before that date — base_url does not change. For a refresher on the migration mechanics, see our DeepSeek OpenAI SDK compatibility notes.

Quickstart: minimal Python and curl

The cheapest way to test JSON mode is with the official OpenAI SDK pointed at DeepSeek’s base URL. Chat requests hit POST /chat/completions, the OpenAI-compatible endpoint. Here is the canonical Python snippet, lifted with light edits from DeepSeek’s own JSON Output guide:

import json
from openai import OpenAI

client = OpenAI(
    api_key="<your api key>",
    base_url="https://api.deepseek.com",
)

system_prompt = """
The user will provide some exam text.
Please parse the "question" and "answer" and output them in JSON format.

EXAMPLE INPUT:
Which is the highest mountain in the world? Mount Everest.

EXAMPLE JSON OUTPUT:
{
  "question": "Which is the highest mountain in the world?",
  "answer": "Mount Everest"
}
"""

user_prompt = "Which is the longest river in the world? The Nile River."

response = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user",   "content": user_prompt},
    ],
    response_format={"type": "json_object"},
    max_tokens=1024,
)

print(json.loads(response.choices[0].message.content))

The equivalent curl call against the same endpoint:

curl https://api.deepseek.com/chat/completions 
  -H "Content-Type: application/json" 
  -H "Authorization: Bearer ${DEEPSEEK_API_KEY}" 
  -d '{
    "model": "deepseek-v4-flash",
    "messages": [
      {"role": "system", "content": "You return JSON only. Schema: {"city":string,"temp_c":number}"},
      {"role": "user",   "content": "Hangzhou, 24 degrees."}
    ],
    "response_format": {"type": "json_object"},
    "max_tokens": 256
  }'

If you have not provisioned credentials yet, work through the API key setup walk-through first; the rest of this guide assumes a working key in DEEPSEEK_API_KEY.

The three prompt rules that prevent empty responses

Most “JSON mode is broken” reports trace back to one of three preventable mistakes. The official docs are explicit about all three. To enable JSON Output, set the response_format parameter to {‘type’: ‘json_object’}. Include the word “json” in the system or user prompt, and provide an example of the desired JSON format to guide the model in outputting valid JSON. Set the max_tokens parameter reasonably to prevent the JSON string from being truncated midway.

The literal word “json” must appear in the prompt. Without it, DeepSeek rejects the request before the model runs. A real bug filed against a Node integration captures the exact error: when using DeepSeek API, if the prompt doesn’t explicitly contain the word “json”, the API returns an error: Prompt must contain the word ‘json’ in some form to use ‘response_format’ of type ‘json_object’. The fix is one line of system prompt — “Reply in JSON only.” does it.
Provide an example schema in the prompt. Without one, the model invents its own keys. The Stagehand catch-22 illustrates this: the project added “json” to satisfy the gate, then found DeepSeek returns a format like: { “action”: “fill”, “element”: {“id”: “…”, “role”: “…”, “name”: “…”}, “argument”: “Cota”, “twoStep”: false } — which did not match the integration’s expected schema at all. Always paste an example object showing exactly the keys, types, and nesting you want.
Set max_tokens high enough that the JSON cannot truncate. If the response stops mid-string, you get invalid JSON and a parser exception. The Chat Completions reference is blunt: the message content may be partially cut off if finish_reason=”length”, which indicates the generation exceeded max_tokens or the conversation exceeded the max context length. Estimate the largest reasonable response and double it; on V4 you can go up to 384,000 output tokens.

If you skip these and dump response_format on a stock chat prompt, expect failures. Our prompt-engineering reference covers the broader pattern of seeding examples into system prompts.

JSON mode vs tool calling: which to use when

Both features produce structured output, but they solve different problems.

Feature	JSON mode	Tool calling (strict)
Activation	`response_format={"type":"json_object"}`	`tools=[...]` with `strict: true`
Schema enforcement	Prompt-level only	Server-side against a JSON Schema
Output shape	Free-form JSON in `content`	Structured args in `tool_calls[].function.arguments`
Best for	One-off extraction, classification, simple objects	Multi-step agents, validated function arguments, enums
Failure mode to handle	Empty content, schema drift	Hallucinated parameters (parser still required)

Tool calling’s strict mode is stricter — DeepSeek validates against your JSON Schema before returning. The constraint is real: all properties of every object must be set as required, and the additionalProperties attribute of the object must be set to false. JSON mode has no such constraint, which is why an example schema in the prompt matters so much. For agent loops or anything where output keys map to function signatures, prefer tool calling — see the DeepSeek API function calling guide. For ad-hoc data extraction from documents, JSON mode is simpler and cheaper.

Detailed reference: parameters and response shape

The full request shape for JSON mode looks like this:

Parameter	Value for JSON mode	Notes
`model`	`deepseek-v4-flash` or `deepseek-v4-pro`	Legacy IDs accepted until 2026-07-24 15:59 UTC.
`messages`	Array including the word “json” + example	Stateless — resend full history every call.
`response_format`	`{"type": "json_object"}`	Required to activate the constraint.
`max_tokens`	Generous — 1024 minimum, up to 384,000	Prevents truncated JSON.
`temperature`	0.0 for deterministic extraction	DeepSeek recommends 0.0 for code/math; raise for creative work.
`reasoning_effort`	Optional `"high"` or `"max"`	Pair with `extra_body={"thinking": {"type": "enabled"}}`.

The response object follows the OpenAI shape. The JSON string lives in choices[0].message.content. If you enable thinking mode, the API returns reasoning_content alongside the final content, and your JSON parsing should target only content. Token usage breaks down into prompt_cache_hit_tokens, prompt_cache_miss_tokens, and completion_tokens — useful for accurate cost accounting (see the next section).

The API is stateless. The server keeps no memory of previous calls; you must resend the full messages array each time. This differs from the web app, which maintains session history on the server. The full API documentation hub has the complete request/response schema if you want to dig deeper.

Pricing worked example

JSON mode does not change the rate card — you pay the same per-token rates as a regular chat completion. The wrinkle is that example schemas in the system prompt are tokens you pay for on every call, so context caching matters more than usual. Pricing below is from DeepSeek’s published rate card as of April 2026; verify against the current DeepSeek API pricing page before committing.

Model tier	Input cache hit ($/1M)	Input cache miss ($/1M)	Output ($/1M)
`deepseek-v4-flash`	$0.028	$0.14	$0.28
`deepseek-v4-pro`	$0.145	$1.74	$3.48

Imagine a document-extraction job: 1,000,000 invoices through deepseek-v4-flash, with a 2,000-token system prompt (containing the JSON schema example) cached across calls, a 200-token user message per invoice (uncached miss), and a 300-token JSON response.

Cached input: 2,000 × 1,000,000 = 2,000,000,000 tokens × $0.028/M = $56.00
Uncached input: 200 × 1,000,000 = 200,000,000 tokens × $0.14/M = $28.00
Output: 300 × 1,000,000 = 300,000,000 tokens × $0.28/M = $84.00
Total: $168.00

Same workload on deepseek-v4-pro for comparison — useful only if Flash’s quality is not enough on your specific schema:

Cached input: 2,000,000,000 × $0.145/M = $290.00
Uncached input: 200,000,000 × $1.74/M = $348.00
Output: 300,000,000 × $3.48/M = $1,044.00
Total: $1,682.00 — roughly 10× the Flash bill.

Two practical notes. First, the cache only applies to repeated prefixes — your per-invoice user message is always a miss, so the uncached-input line never goes to zero. Second, do not mix tier rates in one calculation. For more cost-modelling patterns, see our deep-dive on DeepSeek context caching.

Error handling patterns

JSON mode introduces two errors you need to handle on top of normal API failures.

Empty content

Per the official docs, the API may occasionally return empty content. Detect this on the client and retry with a slightly tweaked prompt or a temperature adjustment. A defensive pattern in Python:

raw = response.choices[0].message.content
if not raw or not raw.strip():
    # Retry once with stronger instruction
    return retry_with_stronger_prompt(...)
try:
    parsed = json.loads(raw)
except json.JSONDecodeError:
    # Truncation or malformed - check finish_reason
    if response.choices[0].finish_reason == "length":
        return retry_with_higher_max_tokens(...)
    raise

Truncation

Always inspect finish_reason. If it equals "length", the response was cut off and the JSON is almost certainly invalid. Bump max_tokens and retry. For the broader catalogue of HTTP-level errors (401, 402 Insufficient Balance, 429 rate limit, 503), see the DeepSeek API error codes reference.

Schema drift

Even with an example schema in the prompt, the model may add or rename keys on edge-case inputs. Validate every parsed payload against a Pydantic model (Python) or Zod schema (TypeScript) before it touches business logic. If validation fails, log the raw output and retry with a strict reminder. This is the pattern that separates demoware from production code; our best-practices write-up goes deeper on it.

JSON mode in thinking mode

You can combine JSON mode with V4’s reasoning capability. Set reasoning_effort="high" and extra_body={"thinking": {"type": "enabled"}} alongside response_format. The API returns reasoning_content alongside the final content — the JSON lives in content, and the reasoning trace in reasoning_content is informational only.

This is useful for extraction tasks that need a chain of inference (legal clauses, multi-step financial reasoning, complex schema mapping). It is overkill for flat extraction. The output token bill goes up because reasoning tokens are billed as completion tokens. Stay in non-thinking mode for high-volume, structurally simple jobs.

Diagram: JSON mode request lifecycle on the OpenAI-compatible endpoint, showing prompt rules, response_format gate, and parser hand-off.

If you are wiring DeepSeek’s structured output into a wider stack:

For agent-style workflows where the model picks a tool, see function calling.
For long-running structured streams (e.g. live transcription cleanup), combine JSON mode with DeepSeek API streaming — note that JSON-mode streams are usable but you must buffer until the close brace before parsing.
For end-to-end runnable scripts, browse the DeepSeek API code examples collection.
For the broader API surface, the DeepSeek API docs and guides hub indexes every reference page on this site.

Last verified: 2026-04-25. DeepSeek AI Guide is an independent resource and is not affiliated with DeepSeek or its parent company. Model IDs, pricing and API behaviour change; check the official DeepSeek documentation and pricing page before committing to a production decision.

Frequently asked questions

Does DeepSeek API JSON mode guarantee valid JSON output?

The official parameter description says JSON mode guarantees valid JSON, but the JSON Output guide also notes that the API may occasionally return empty content. Treat it as a strong constraint, not a contract — always wrap parsing in a try/except, check finish_reason for truncation, and validate against a schema. The DeepSeek API best practices guide covers the full retry pattern.

How do I fix the “Prompt must contain the word ‘json'” error?

DeepSeek requires the literal string “json” in your system or user prompt before it will honour response_format={"type": "json_object"}. Add a sentence like “Respond in JSON only” to your system message. While you are there, paste an example of the exact schema you want — the model otherwise invents its own keys. Our prompt engineering guide has more patterns.

What is the difference between JSON mode and tool calling on DeepSeek?

JSON mode constrains the assistant’s content field to a JSON string, with schema enforcement coming from your prompt. Tool calling returns structured arguments inside tool_calls, validated server-side against a JSON Schema when you set strict: true. Use JSON mode for one-shot extraction; use function calling for agents, enums, or anywhere a typed signature matters.

Can I use JSON mode with the legacy deepseek-chat model ID?

Yes, until 2026-07-24 at 15:59 UTC. deepseek-chat and deepseek-reasoner currently route to deepseek-v4-flash in non-thinking and thinking modes respectively, and JSON mode works on both routes. After that retirement date, those IDs will fail. Migrate by changing the model string to deepseek-v4-flash or deepseek-v4-pro — the OpenAI compatibility notes cover the one-line swap.

Why is my DeepSeek JSON response getting truncated?

You hit the max_tokens ceiling. The API stops generating mid-string, which produces invalid JSON. Inspect finish_reason — if it equals "length", raise max_tokens and retry. V4 supports up to 384,000 output tokens, so you have headroom. Use the DeepSeek token counter to size prompts accurately before sending.

How DeepSeek API JSON Mode Works on V4 (With Code)

What DeepSeek API JSON mode actually does

Quickstart: minimal Python and curl

The three prompt rules that prevent empty responses

JSON mode vs tool calling: which to use when

Detailed reference: parameters and response shape

Pricing worked example

Error handling patterns

Empty content

Truncation

Schema drift

JSON mode in thinking mode

Related reading

Frequently asked questions

Does DeepSeek API JSON mode guarantee valid JSON output?

How do I fix the “Prompt must contain the word ‘json'” error?

What is the difference between JSON mode and tool calling on DeepSeek?

Can I use JSON mode with the legacy deepseek-chat model ID?

Why is my DeepSeek JSON response getting truncated?

Related articles

Understanding DeepSeek API Rate Limits in Production

How DeepSeek API Authentication Works (and How to Do It Right)

DeepSeek API Streaming: SSE on V4-Pro and V4-Flash

DeepSeek API Function Calling on V4: A Practical Guide

How DeepSeek API Context Caching Cuts Input Costs by 80–90%

Handling DeepSeek API Error Codes Without Guesswork

Leave a ReplyCancel Reply