DeepSeek Prompt Templates: A Practitioner’s Library for V4

Copy-paste DeepSeek prompt templates for V4-Pro and V4-Flash — coding, JSON, reasoning, RAG. Tune temperature, save tokens. Start building today.

Tools·April 25, 2026·By DS Guide Editorial
DeepSeek prompt-templates card showing the four facts about the practitioner library: twelve copy-paste templates organised across five task types (extraction, code, reasoning, writing, agentic tool use), tuned to V4-Pro and V4-Flash with the temperature and system-prompt patterns DeepSeek's own docs recommend.

You opened the DeepSeek console, pasted “write me a marketing email,” and got something generic. The model is not the problem. The prompt is. After six months running DeepSeek V4-Pro and V4-Flash through production workloads — extraction pipelines, code review, RAG over legal docs, weekly reports — I’ve collected a small library of DeepSeek prompt templates that consistently outperform ad-hoc requests. They are short, structured, and tuned to the parameters DeepSeek’s docs actually recommend.

This article gives you twelve copy-paste templates organised by task — extraction, code, reasoning, writing, agentic tool use — plus the temperature settings, system prompt structure, and JSON-mode caveats that make them work. Every template has been tested against `deepseek-v4-pro` and `deepseek-v4-flash` since the V4 Preview shipped on April 24, 2026.

Why prompt templates matter more on DeepSeek V4

Live screenshot of the DeepSeek prompt templates tool with categorized template list
Interactive tool

DeepSeek Prompt Templates Library

23 tested-shape prompt skeletons that work reliably on V4. Search, filter by category, copy, fill the {placeholders}, send.

Code review (file-level)
Code
<role> Senior engineer reviewing a pull request for production code. </role> <task> Review the file below and surface defects: bugs, race conditions, security issues, perf problems, unclear naming. </task> <file> {paste your file here} </file> <output_format> Numbered list. For each item: file:line — issue — recommended fix. Skip nits. </output_format>
review pr quality
Bug repro from a stack trace
Code
<role> Debugger. </role> <context> Language / framework: {language} Observed error: {paste stack trace} Relevant code: {paste suspect file or function} </context> <task> 1. Identify the root cause in one sentence. 2. Provide a minimal repro (5–10 lines). 3. Show the fix as a unified diff. </task>
debug stack-trace repro
Refactor for readability
Code
<task> Refactor the function below for readability without changing behaviour. </task> <rules> - Keep public signatures identical. - Prefer early returns over deep nesting. - Add inline comments only where the WHY is non-obvious. - Output the refactored function only — no explanation. </rules> <code> {paste function} </code>
refactor readability
Test generation (table-driven)
Code
<task> Write table-driven tests for the function below. </task> <requirements> - Cover happy path, edge cases (empty, null, max, min, unicode), and 1–2 failure modes. - Use the project\'s existing test framework: {framework}. - Each case has: name, input, expected, expected_error. </requirements> <function> {paste function} </function>
tests table-driven
Generate a clean commit message
Code
<task> Write a commit message for the diff below. </task> <rules> - Subject line ≤ 50 chars, imperative mood. - Body lines wrap at 72 chars. - Explain the WHY, not the WHAT — the diff already shows the WHAT. - No trailers (no "Signed-off-by", no "Co-authored"). </rules> <diff> {paste diff} </diff>
commit git changelog
Explain a SQL query
Code
<task> Explain what the SQL query below does, in plain English, and identify any performance concerns. </task> <query> {paste SQL} </query> <output_format> 1. One-sentence summary. 2. Step-by-step walkthrough of the joins and filters. 3. Performance notes — full scans, missing indexes, cartesian risk. </output_format>
sql explain db
Article outline → draft
Writing
<role> Writer for a software engineering audience. </role> <task> Expand the outline below into a draft article. Conversational but technical. ~1200 words. </task> <outline> {paste outline} </outline> <voice> - No marketing language. - Concrete examples over abstract claims. - Short paragraphs. Subheads every 200–300 words. </voice>
outline draft article
Summarise a long doc (3 layers)
Writing
<task> Summarise the document below. </task> <requirements> - Three layers: (1) one-sentence TL;DR, (2) 5-bullet executive summary, (3) section-by-section summary. - Preserve specific numbers and named entities. - Flag anything ambiguous as [unclear]. </requirements> <document> {paste document} </document>
summary tldr document
Translate (matching register)
Writing
<task> Translate the source below into {target_language}. </task> <rules> - Preserve register (technical, formal, casual — match what the source is). - Keep proper nouns and code untouched. - Where a phrase doesn\'t translate cleanly, render it naturally and footnote with [literal: …]. </rules> <source> {paste source} </source>
translate localise
Draft a professional email reply
Writing
<task> Draft a reply to the email below. </task> <original> {paste email} </original> <my_position> {what I want to convey} </my_position> <rules> - Match the tone of the original. - Address every direct question. - Keep ≤ 150 words. - No emojis, no exclamation marks unless the original used them. </rules>
email reply professional
Polish without changing meaning
Writing
<task> Rewrite the text below to be clearer, more direct, and shorter — without changing any meaning, claim, or fact. </task> <rules> - Cut filler. Combine sentences only when clarity holds. - Active voice unless passive is required. - Don\'t paraphrase technical terms. - Output the rewrite only — no explanation. </rules> <text> {paste text} </text>
rewrite tone polish
Step-by-step problem solving
Reasoning
<task> Solve the problem below. </task> <problem> {paste problem} </problem> <format> 1. Restate the problem in your own words. 2. List unknowns and givens. 3. Sketch an approach. 4. Work it through, step by step. 5. State the final answer plainly. 6. Sanity-check by plugging the answer back into the problem. </format>
cot problem-solving
Decision tree (pick one)
Reasoning
<task> Help me decide between options below. Don\'t restate; recommend. </task> <options> - Option A: {describe} - Option B: {describe} - Option C: {describe} </options> <criteria_in_priority_order> 1. {first} 2. {second} 3. {third} </criteria_in_priority_order> <output_format> One paragraph: which option, then why in terms of the criteria. End with one risk and how to mitigate it. </output_format>
decision pick recommend
Steel-man and red-team an idea
Reasoning
<task> For the idea below, write the strongest possible case FOR it (steel-man), then the strongest case AGAINST it (red-team). </task> <idea> {paste idea or proposal} </idea> <rules> - Each side gets ~5 distinct arguments. - Don\'t hedge. Make each argument as forcefully as a true believer would. - End with: which side I\'d believe more, and why. </rules>
critique red-team steelman
Structured extraction from text
Data
<task> Extract structured information from the input. </task> <schema> { "people": [{"name": str, "role": str, "company": str | null}], "dates": [{"date": str (ISO), "event": str}], "amounts":[{"value": number, "unit": str, "context": str}] } </schema> <rules> - Output strict JSON only — no prose, no code fences. - Use null for missing fields. - ISO 8601 for dates. </rules> <input> {paste text} </input>
extract json structured
Classification with explanations
Data
<task> Classify each item below into one of: {label_a}, {label_b}, {label_c}, other. </task> <rules> - Output JSON: [{"id": str, "label": str, "confidence": float 0-1, "reason": str}]. - "reason" must be ≤ 15 words. - Use "other" rather than guessing. </rules> <items> {paste items} </items>
classify labels confidence
SQL from English
Data
<task> Write a single SQL query that answers the question. </task> <schema> {paste CREATE TABLE statements or DBML} </schema> <question> {your question} </question> <rules> - Use only the tables/columns in <schema>. - Comment any non-obvious join. - No explanation outside the query. </rules>
sql nl2sql query
Clean and normalise messy CSV
Data
<task> Clean the CSV below: normalise dates to ISO 8601, trim whitespace, fix obvious typos, deduplicate rows by {key_column}, and flag suspicious values. </task> <rules> - Output the cleaned CSV. - Append a separate ## Issues section listing every change you made and any rows you flagged. </rules> <csv> {paste CSV} </csv>
csv clean transform
Tool-using agent system prompt
Agents
<role> A careful agent that uses tools when needed and reasons aloud before acting. </role> <tools_available> {list each tool name + one-line description} </tools_available> <process> 1. Read the user\'s goal carefully. 2. Think out loud (1–3 sentences) about what information you need. 3. Call a tool when you don\'t already know the answer. Never invent tool results. 4. After each tool result, decide: another tool, or final answer. 5. Give a clear, sourced final answer. </process> <rules> - One tool call per turn. - Cite tool results in the final answer ("according to the {tool_name} call …"). </rules>
tool-call system planner
Function-calling system prompt
Agents
<role> You call functions when the user\'s request needs external data or actions. </role> <rules> - Always prefer calling a function over guessing. - Match arguments exactly to the function schema. - After receiving a tool_result, reason briefly, then either call another function or produce the final answer. - If no function fits, ask the user a clarifying question — don\'t hallucinate one. </rules>
function-call schema
Describe a UI screenshot
Vision
<task> Describe the UI in the attached image as if to someone who can\'t see it. </task> <format> 1. One-sentence overview (what kind of screen). 2. Layout: regions / panels / their relationship. 3. Notable controls and their state (enabled / disabled / selected). 4. Visible text content (preserve exactly). 5. Anything that looks broken or inconsistent. </format>
screenshot ui describe
Extract data from a chart image
Vision
<task> Read the chart in the attached image and return its data as a table. </task> <rules> - Output Markdown table with one row per data point. - Note the chart type, axis labels, and units. - For unlabelled values, estimate to within ±5% and flag as [estimated]. - If the chart is unreadable, say so plainly. </rules>
chart extract data
Grounded answer with citations
RAG
<role> Answer questions only using the provided source documents. Cite by [doc_id]. </role> <rules> - If the answer isn\'t in the sources, say "I don\'t have that information in the provided sources." - Every factual claim ends with one or more [doc_id] citations. - Don\'t mix in outside knowledge. </rules> <sources> {paste retrieved chunks, each prefixed with [doc_id]} </sources> <question> {user question} </question>
rag grounded citations

Each template uses XML tags — DeepSeek V4 parses these reliably. Replace every {placeholder} before sending.

V4 changed the rules. The legacy model names deepseek-chat and deepseek-reasoner will be deprecated on 2026-07-24, and they now correspond to the non-thinking and thinking modes of deepseek-v4-flash respectively. That migration matters for templates because thinking is no longer a separate model — it is a request parameter on either deepseek-v4-pro or deepseek-v4-flash. The same prompt produces measurably different outputs depending on whether you set reasoning_effort="high" with extra_body={"thinking": {"type": "enabled"}}, or leave it off.

The other thing V4 changed: context. Both tiers default to a 1,000,000-token window with output up to 384,000 tokens. That means templates can be longer, carry more examples, and embed more reference material than they could on V3.2. But long prompts are also wasted spend if you do not structure them so DeepSeek’s context cache can hit. We will cover the cache-friendly layout below.

For background on the underlying models, see DeepSeek V4-Pro and DeepSeek V4-Flash. If you want a deeper dive on prompt theory rather than ready-made templates, the companion piece on DeepSeek prompt engineering is the better starting point.

The four-block template structure

Every template in this article follows the same four-block layout. It is the structure I find easiest to maintain and the easiest for context caching to fingerprint.

  1. System block — role, constraints, output format. Stable across calls.
  2. Reference block — schemas, examples, retrieved documents. Mostly stable.
  3. Task block — what to do this turn. Changes per call.
  4. User input — the volatile content (a question, a file, a row).

Blocks 1 and 2 belong in the system message or the front of the first user message. Block 3 and 4 go in subsequent user messages. Because the default temperature is 1.0 and DeepSeek recommends adjusting per use case — close to 0.0 for code and structured analysis, around 1.3 for general conversation, and above 1.0 for creative work, every template below pairs the prompt with a recommended temperature.

Temperature settings to pair with each template

Task type Temperature Notes
Code generation, math 0.0 Deterministic, fewer hallucinated APIs
Data extraction, cleaning 1.0 DeepSeek’s recommended setting for analytical work
General Q&A, translation 1.3 Default-ish; balanced
Creative writing, poetry 1.5 More variety, more revisions needed

These match DeepSeek’s official guidance on the API parameter docs page. Pick one and stick to it for that template — do not also push top_p; DeepSeek and the OpenAI-compatible spec generally recommend altering temperature or top_p but not both.

Template 1 — Structured extraction (JSON mode)

Use this when you need typed output. The DeepSeek API uses an OpenAI/Anthropic-compatible format, so you can use the OpenAI SDK against the DeepSeek base URL by changing only the configuration. Set response_format={"type": "json_object"}, set max_tokens high enough to avoid truncation, and include the word “json” plus a small example schema in the prompt.

System:
You are a structured-data extractor. Return only valid JSON matching the schema.
Do not include prose. If a field is unknown, return null.

Schema (example):
{
  "company": "Acme Inc",
  "amount_usd": 12500.00,
  "invoice_date": "2026-04-12",
  "line_items": [{"sku": "A-1", "qty": 2, "price": 19.99}]
}

User:
Extract a json record from the email below. Use the schema above.

[paste email]

JSON mode is designed to return valid JSON, not guaranteed. The model may occasionally return empty content, so wrap parsing in a retry. Full caveats are documented in the DeepSeek API JSON mode guide. Recommended model: deepseek-v4-flash, non-thinking, temperature 1.0.

Template 2 — Code generation with constraints

Code is where the temperature-0 setting earns its keep. Tell the model the language, the runtime, the libraries you allow, and the failure modes you do not want.

System:
You are a senior Python engineer. Write Python 3.12 code only.
Allowed libs: stdlib, httpx, pydantic. No requests, no global state.
Return one fenced code block plus a 3-line explanation.

User:
Write a function fetch_invoices(api_key: str, since: date) -> list[Invoice]
that pages /v1/invoices?since=... and returns a list of Pydantic Invoice
models with id, amount_cents, currency, paid_at fields.

For multi-file refactors, switch to deepseek-v4-pro with thinking enabled. Pair this template with the workflow described in DeepSeek for coding.

Template 3 — Reasoning with thinking mode

Flat-vector diagram of a cache-friendly DeepSeek V4 prompt template showing cached system and shot blocks plus fresh user input.

Thinking mode is a parameter, not a model. Set reasoning_effort="high" and extra_body={"thinking": {"type": "enabled"}}; the response returns reasoning_content alongside the final content. Use "max" for hardest problems — the docs note this requires max_model_len >= 393216 to avoid truncation.

System:
You are a careful analyst. Think through the problem step by step in
reasoning_content. In content, return only the final answer in the
format requested.

User:
A SaaS charges $0.14 per 1M input tokens and $0.28 per 1M output.
We make 1M calls/day, each with 2,000 cached input + 200 fresh input
+ 300 output tokens. Cache-hit input is $0.0028/M. What is the daily
spend, broken down by bucket? Return a markdown table.

That is the V4-Flash worked example from earlier in this article — the daily total is $117.60 (cached $5.60, miss $28, output $84). Showing the math beats trusting any single model output.

Template 4 — Cache-friendly system prompt

Context caching cuts input cost from $0.14 to $0.0028 per million tokens on V4-Flash, and from $1.74 list / $0.435 promo to $0.0145 list / $0.003625 promo on V4-Pro through 2026-05-31. The cache-hit price applies automatically when DeepSeek detects a repeated prefix. To trigger it, keep the prefix bit-identical across calls.

  • Put the entire static instruction set, schemas, and few-shot examples in messages[0] (system) or the very front of messages[1] (user).
  • Never interpolate timestamps, request IDs, or per-user names into the prefix.
  • If the user-specific bit must come early, push it into a later message and keep the system block stable.

For the math on what cache hits actually save, the DeepSeek context caching reference walks through a worked example.

Template 5 — RAG with cited sources

System:
You answer questions only from the provided sources. Cite each claim
with the source id in square brackets, e.g. [S2]. If the sources do
not contain the answer, reply: "Not in sources." Do not invent facts.

Sources:
[S1] {chunk 1 text}
[S2] {chunk 2 text}
[S3] {chunk 3 text}

User:
{question}

Pair with retrieval as covered in the DeepSeek RAG tutorial. Temperature 1.0; deepseek-v4-flash is fine for most retrievals, escalate to V4-Pro when answers must be exhaustive.

Template 6 — Long-document summarisation

With a 1M-token context, you rarely need chunking just to fit. You still need to direct what gets summarised.

System:
Summarise the document in three layers:
1. One-sentence verdict
2. Five bullet key findings, each <= 20 words
3. Risks and unknowns, with section references like (p. 12)

Do not editorialise. Do not add information that is not in the document.

Template 7 — Role-based critique

Role prompts work because they constrain vocabulary and what the model treats as in-scope. Be specific.

You are a database administrator with 15 years of Postgres experience.
Review the SQL below. List, in order of severity:
1. Correctness bugs
2. Performance issues at > 10M rows
3. Style problems
For each, quote the offending line and propose a fix.

Template 8 — Tool calling skeleton

Function calling works in the OpenAI format. Declare tools, let the model emit a tool call, run the call client-side, append the result to messages, and call again. The API is stateless — chat requests hit POST /chat/completions, the OpenAI-compatible endpoint, and you must resend the conversation history on every turn.

from openai import OpenAI
client = OpenAI(base_url="https://api.deepseek.com", api_key="...")

tools = [{
  "type": "function",
  "function": {
    "name": "get_weather",
    "description": "Current weather for a city",
    "parameters": {
      "type": "object",
      "properties": {"city": {"type": "string"}},
      "required": ["city"]
    }
  }
}]

resp = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[{"role": "user", "content": "Weather in Dublin?"}],
    tools=tools,
)

For the full pattern see DeepSeek API function calling.

Template 9 — Translation with register control

Editor and terminal showing a DeepSeek tool-calling prompt template skeleton with tool_call and tool_result in flow.
Translate the text below from English to {target}.
Register: {formal|conversational|technical}.
Preserve: code blocks, URLs, numbers, proper nouns.
Output only the translation, no commentary.

Temperature 1.3 produces less wooden output than 0; the official docs flag 1.3 as the recommended setting for translation.

Template 10 — Creative writing brief

Write a {format} of {length} words about {topic}.
Audience: {who}. Tone: {adjective + adjective}.
Constraints: {must include}. Avoid: {must not}.
Provide two distinct drafts, then a one-line note on the difference.

Temperature 1.5. If both drafts feel the same, raise to 1.7 or rerun — V4 at low temperature can lock onto a single voice.

Template 11 — Test-case generator

For the function signature below, generate pytest tests covering:
- Happy path (3 cases)
- Edge cases (empty, single element, max size)
- Error cases (invalid type, out-of-range)
Return one fenced code block. No commentary.

{paste signature + docstring}

Template 12 — Self-critique pass

Cheap on V4-Flash, often worth running. The first call drafts; the second critiques and rewrites.

Pass 1 (temperature 1.3): Draft the response.
Pass 2 (temperature 0.0): "Critique the draft below for factual errors,
weak claims, and structural problems. Rewrite to fix them. Output only
the rewritten version."

Endpoint and parameter reference

Every template above goes to POST /chat/completions at https://api.deepseek.com. DeepSeek’s API is OpenAI- and Anthropic-compatible, so you can swap base URL and key in either SDK. The parameters that show up in the templates above:

  • modeldeepseek-v4-pro or deepseek-v4-flash
  • temperature — see the table above
  • max_tokens — set high (e.g. 8000+) when using JSON mode
  • reasoning_effort"high" or "max" with extra_body={"thinking": {"type": "enabled"}}
  • response_format{"type": "json_object"} for Template 1
  • tools — for Template 8
  • stream — true for chat UIs

For setup, see how to get a DeepSeek API key. For the broader collection of utilities including the prompt generator, see the DeepSeek tools and utilities hub.

What I would not do

Diagram of a DeepSeek prompt template library mapping categories into a structured prompt sent to the DeepSeek API.
  • Inline timestamps in the system prompt — kills cache hits.
  • Set both temperature and top_p aggressively — pick one.
  • Use thinking mode for short factual questions — you pay for output tokens you do not need.
  • Trust JSON mode without a parser fallback — it is “designed to” return JSON, not guaranteed.
  • Quote a single SWE-Bench number from a third-party blog without checking the split (Verified vs Full vs Lite).

DeepSeek AI Guide is an independent resource and is not affiliated with DeepSeek or its parent company. Model IDs, pricing and API behaviour change; check the official DeepSeek documentation and pricing page before committing to a production decision.

Frequently asked questions

How do I write a system prompt for DeepSeek V4?

Put the role, constraints, and output format in the system message and keep it bit-identical across calls so context caching can hit. Add schemas or few-shot examples in the same block. Save volatile content (timestamps, user names, today’s question) for later user messages. The four-block structure used in our DeepSeek prompt engineering guide works for both V4-Pro and V4-Flash.

What temperature should I use with DeepSeek prompt templates?

DeepSeek’s docs recommend 0.0 for code and math, 1.0 for data analysis, 1.3 for general conversation and translation, and 1.5 for creative writing. The default is 1.0. Match the temperature to the template you are running — code templates at 0.0, creative templates at 1.5. More guidance lives in the DeepSeek API best practices reference.

Can I use the same prompt for V4-Pro and V4-Flash?

Yes — both tiers share the same API surface and prompt format. The difference is cost and capability ceiling: V4-Pro is roughly six to seven times more expensive on output and worth it for frontier coding or agentic work. For chat, extraction, and most RAG, V4-Flash is the default. Compare both on the DeepSeek pricing calculator.

Does DeepSeek JSON mode guarantee valid JSON output?

No. DeepSeek’s docs describe JSON mode as designed to return valid JSON, not guaranteed. The model may occasionally return empty content, so always wrap parsing in a try/except with a retry. Include the word “json” plus an example schema in the prompt, and set max_tokens high enough that the response cannot be truncated mid-object. The DeepSeek API JSON mode doc covers the failure cases in detail.

Why is my DeepSeek prompt template more expensive than expected?

Almost always one of three things: the system prompt is changing across calls (no cache hits), max_tokens is uncapped so thinking mode produces enormous traces, or you are on V4-Pro when V4-Flash would do. Check token counts with the DeepSeek token counter and review the cost breakdown by bucket — cached input, uncached input, and output — separately.

Sources & Methodology

Last source check: April 30, 2026

Primary sources

Technical references

Methodology

Information in this article was checked against the official DeepSeek documentation, the article's cited primary sources, and the deepseekai.guide pricing dataset on the date shown above. Where third-party numbers are quoted, the originating source is linked inline so the reader can verify and re-check.

Data confidence

High for facts cited inline with primary-source links; medium for context narrative.

Editorial note

AI tooling moves quickly. If you spot an outdated number or a broken link, the contact link in the footer is the fastest way to flag it for the editorial team.

Leave a Reply

Your email address will not be published. Required fields are marked *