How to Use DeepSeek with VS Code: A Working V4 Setup
You want a coding assistant inside VS Code that doesn’t bill you like a SaaS subscription, doesn’t lock you into one vendor, and handles a real codebase without hallucinating imports. Wiring DeepSeek with VS Code gets you there in under ten minutes — chat, autocomplete, and agentic refactors against `deepseek-v4-pro` or `deepseek-v4-flash`, both released April 24, 2026 and both billed per token.
This guide walks through two extensions I run daily — Continue and Cline — with copy-paste config blocks, a worked cost example for a typical week of coding, and the gotchas that broke my setup the first time. By the end you’ll have a tested integration and know which tier to point at which task.
What you’ll build
Two parallel setups inside VS Code, both pointing at DeepSeek’s OpenAI-compatible API:
- Continue — chat sidebar, inline edits, and tab autocomplete. The lighter-weight option; great default for most developers.
- Cline — agentic mode that can read your workspace, run commands, and edit files across a task. Better for multi-file refactors and bug hunts.
Both extensions talk to the same endpoint: POST /chat/completions at https://api.deepseek.com. You pick the model per request — deepseek-v4-pro for frontier-tier coding work, deepseek-v4-flash for everyday autocomplete and chat. The API is stateless, so each request resends history; the extensions handle that for you.
Prerequisites
- VS Code 1.85 or newer (Cursor and VSCodium also work — same extension marketplace).
- A DeepSeek account with billing enabled and a generated API key. If you don’t have one, follow our walkthrough to get a DeepSeek API key.
- Node.js 18+ on your machine (Continue’s autocomplete worker needs it).
- About $5 of API credit to start. A week of moderate use on V4-Flash rarely tops $2 in my experience.
Step 1: Generate and store your API key
Sign in at platform.deepseek.com, open API Keys, and create a key. Copy it once — you cannot view it again. Store it in your OS keychain or a .env file outside the repo. Never commit it. The full mechanics — header format, header rotation, scoping — are covered in our DeepSeek API authentication guide.
Quick smoke test from the terminal using curl. This confirms the key works before you wire it into VS Code:
curl https://api.deepseek.com/chat/completions
-H "Content-Type: application/json"
-H "Authorization: Bearer $DEEPSEEK_API_KEY"
-d '{
"model": "deepseek-v4-flash",
"messages": [{"role": "user", "content": "Reply with OK."}],
"max_tokens": 16
}'
You should see a JSON response with "OK" in the content field. If you get a 401, the key is wrong or unfunded; if a 402, your balance is empty.
Step 2: Install Continue and point it at DeepSeek
In VS Code, open the Extensions panel and install Continue (publisher: Continue). After install, click the Continue icon in the activity bar; it will ask you to add a model.
Skip the wizard and edit the config directly — it’s faster and reproducible. Open the command palette (Cmd/Ctrl+Shift+P) and run Continue: Open config.yaml. Replace the contents with this YAML:
name: DeepSeek V4
version: 1.0.0
schema: v1
models:
- name: DeepSeek V4 Pro
provider: openai
model: deepseek-v4-pro
apiBase: https://api.deepseek.com/v1
apiKey: sk-your-key-here
roles: [chat, edit]
- name: DeepSeek V4 Flash
provider: openai
model: deepseek-v4-flash
apiBase: https://api.deepseek.com/v1
apiKey: sk-your-key-here
roles: [chat, edit, autocomplete]
- name: DeepSeek V4 Flash (Thinking)
provider: openai
model: deepseek-v4-flash
apiBase: https://api.deepseek.com/v1
apiKey: sk-your-key-here
defaultCompletionOptions:
reasoningEffort: high
roles: [chat, edit]
Three model entries give you the spread I find useful: Flash for autocomplete (fast, cheap), Pro for hard chat questions and edits, and a Flash thinking variant for tricky bugs where you want reasoning_content alongside the final content.
Save the file. Continue hot-reloads. Open the chat sidebar, pick a model from the dropdown, and ask it to explain a function in your repo. If you see a streamed answer, you’re connected.
Tab autocomplete settings
Continue calls the autocomplete model on every cursor pause. V4-Flash in non-thinking mode is the right pick — fast and cheap. Tune the temperature low: per DeepSeek’s official guidance, use 0.0 for code generation. Add this to config.yaml:
tabAutocompleteOptions:
temperature: 0.0
maxPromptTokens: 4096
debounceDelay: 350
The 350 ms debounce stops Continue from firing on every keystroke. Without it, an hour of typing can quietly burn through a dollar.
Step 3: Install Cline for agentic tasks
Continue is great for chat and completions. Cline is what you want when the task is “find every call site of parseConfig, migrate it to the new signature, and update the tests.” Install Cline from the Extensions panel.
Open Cline’s settings (gear icon in the Cline panel). Choose OpenAI Compatible as the provider and fill in:
- Base URL:
https://api.deepseek.com/v1 - API Key: your DeepSeek key
- Model ID:
deepseek-v4-pro - Context window: 1000000
- Max output: 65536 (raise to 384000 for very long generations)
For agentic loops, V4-Pro earns its 12× output-cost premium. It plans more carefully, calls tools more accurately, and recovers from errors better than Flash on multi-step tasks. For trivial tasks (“add a JSDoc comment to this function”) switch to Flash.
Step 4: Verify it actually works
Three quick tests confirm the wiring end-to-end:
- Chat round-trip. Open a source file, highlight a function, and ask Continue: “What does this do, and what’s one edge case it misses?” A streamed answer means chat works.
- Autocomplete. In a JavaScript or Python file, type
function fibonacci(and pause. A grey ghost-text suggestion should appear within a second. - Agentic edit. In Cline, type: “Read package.json, list outdated dependencies that have major version bumps available.” Cline should request file-read permission, read the file, and return a list. Approve actions one at a time the first few runs.
Choosing a tier: V4-Flash vs V4-Pro
The single biggest cost decision in this setup is which model handles which task. Here’s how I split work:
| Task | Recommended model | Why |
|---|---|---|
| Tab autocomplete | deepseek-v4-flash |
Latency matters more than reasoning depth; ~12× cheaper output |
| Chat: explain code, write a function | deepseek-v4-flash |
Flash handles 80% of dev questions cleanly |
| Multi-file refactor in Cline | deepseek-v4-pro |
Tool-calling reliability across a long agent loop |
| Tricky debugging, “why does this race?” | Flash with reasoning_effort=high |
Cheap thinking mode often catches what default Flash misses |
| Architecture / design discussion | deepseek-v4-pro |
Worth the spend for high-leverage decisions |
| Generating boilerplate (DTOs, tests) | deepseek-v4-flash |
Repetitive, low-risk, high-volume — Flash wins on price |
For the broader picture on which DeepSeek model fits which job, see the DeepSeek models hub. The model-specific pages for DeepSeek V4-Flash and DeepSeek V4-Pro have benchmark detail.
What a week actually costs
Here’s a representative week of solo dev work — me, building a Next.js app, with autocomplete on:
- ~300 chat turns in Continue, mostly Flash, average 3K tokens in / 800 out per turn.
- ~2,000 autocomplete completions on Flash, ~1.5K tokens in / 80 out each, with heavy prefix caching from the open file.
- ~15 Cline agent runs on Pro, ~40K tokens in (mixed cache hits + misses) / 4K out each.
Costed at deepseek-v4-flash rates ($0.028 cache hit / $0.14 cache miss / $0.28 output per 1M tokens) and deepseek-v4-pro rates ($0.145 / $1.74 / $3.48):
Continue chat (V4-Flash):
Input cache hit : 540M tokens × $0.028/M = $0.015
Input cache miss : 360M tokens × $0.14/M = $0.050
Output : 240K tokens × $0.28/M = $0.067
-------
$0.13
Continue autocomplete (V4-Flash):
Input cache hit : 2.4M tokens × $0.028/M = $0.067
Input cache miss : 0.6M tokens × $0.14/M = $0.084
Output : 160K tokens × $0.28/M = $0.045
-------
$0.20
Cline agent runs (V4-Pro):
Input cache hit : 360K tokens × $0.145/M = $0.052
Input cache miss : 240K tokens × $1.74/M = $0.418
Output : 60K tokens × $3.48/M = $0.209
-------
$0.68
Weekly total: ~$1.01
That’s a week of paid AI assistance for the price of a vending-machine coffee. The numbers swing if you put autocomplete on Pro (don’t) or run dozens of agent loops daily, but the ceiling stays low. For a deeper read on these numbers, see the DeepSeek API pricing reference and the DeepSeek cost estimator.
Useful API parameters and features
Both Continue and Cline pass through the standard OpenAI-compatible parameters. The ones worth knowing:
temperature— DeepSeek’s official guidance: 0.0 for code, 1.0 for data analysis, 1.3 for general chat, 1.5 for creative writing. Continue and Cline both expose this.max_tokens— cap on output. V4 supports up to 384,000 tokens of output against a 1,000,000-token context. Most coding tasks fit comfortably under 8,000.reasoning_effort— V4-only. Set to"high"withextra_body={"thinking": {"type": "enabled"}}for thinking mode. Use"max"for the deepest reasoning (requires output room of at least 384K tokens).- Streaming — both extensions stream by default. When thinking is on, the reasoning content streams alongside the final answer.
- Context caching — applied automatically when DeepSeek detects repeated prefixes (system prompt, open files). Drives the cost example above.
- Tool calling — Cline relies on this for file ops and shell commands. Works on both V4 tiers.
- FIM completion (Beta) — non-thinking mode only. Continue’s autocomplete uses a similar prefix/suffix pattern internally.
JSON mode is worth flagging if you build custom commands: it’s designed to return valid JSON, not guaranteed. Include the word “json” in your prompt with a small example schema, and set max_tokens high enough that the response can’t truncate mid-object. For more, see DeepSeek API JSON mode.
Common errors and fixes
| Symptom | Likely cause | Fix |
|---|---|---|
| 401 Unauthorized | Wrong key or trailing whitespace | Regenerate the key, paste with no spaces. Confirm with the curl test above. |
| 402 Insufficient Balance | Empty wallet | Top up at platform.deepseek.com. The granted balance, if any, expires; check the billing console. |
| 429 Rate Limit | Burst from autocomplete | Increase debounceDelay in Continue. See DeepSeek API rate limits. |
| Empty content in response | JSON mode without proper prompt | Add the word “json” + example schema; raise max_tokens. |
| Cline hangs mid-task | Output token cap hit | Raise max output to 65,536 or higher in Cline settings. |
| Slow first token in thinking mode | Reasoning trace generates before answer | Expected. Use non-thinking mode for chat where latency matters. |
Old deepseek-coder ID rejected |
Deprecated | Use deepseek-v4-flash or deepseek-v4-pro. |
A note on legacy model IDs
If you maintain an older integration, the IDs deepseek-chat and deepseek-reasoner still work — they currently route to deepseek-v4-flash in non-thinking and thinking mode respectively. Both retire on 2026-07-24 at 15:59 UTC. Migration is a one-line model= swap; the base_url does not change. Update your VS Code config before that date.
Privacy and where your code goes
Every request from VS Code travels to DeepSeek’s servers in China and is processed there. That includes the contents of your open files when autocomplete or chat fires. If you’re working under an NDA, with regulated data, or under enterprise policy that restricts cross-border data flow, this is the wrong setup — run a local model via Ollama instead. Our walkthrough on running DeepSeek on Ollama covers the offline path. The trade-offs are spelled out in DeepSeek privacy.
For DeepSeek’s own technical disclosures and the V4 model cards, see DeepSeek’s official API documentation.
Next steps
Two directions from here, depending on what you’re building:
- If you want to write your own VS Code commands or build a coding agent of your own, start with DeepSeek Python integration or DeepSeek Node.js integration — both use the same OpenAI-compatible client.
- If you want to compare what you’ve built against alternatives, our DeepSeek Coder vs Copilot writeup is the honest head-to-head.
- For the wider catalogue of walkthroughs, the DeepSeek tutorials hub indexes everything from RAG pipelines to Discord bots.
Last verified: 2026-04-25. DeepSeek AI Guide is an independent resource and is not affiliated with DeepSeek or its parent company. Model IDs, pricing and API behaviour change; check the official DeepSeek documentation and pricing page before committing to a production decision.
How do I use DeepSeek with VS Code without writing my own extension?
Install Continue or Cline from the VS Code marketplace and configure them with https://api.deepseek.com/v1 as the base URL, your DeepSeek API key, and either deepseek-v4-pro or deepseek-v4-flash as the model. Both extensions speak the OpenAI Chat Completions wire format, which DeepSeek implements natively. The DeepSeek OpenAI SDK compatibility page covers the underlying contract.
Is DeepSeek free to use inside VS Code?
The extensions (Continue, Cline) are free and open source, but DeepSeek’s API is paid per token. There’s no perpetual free tier; DeepSeek may offer a granted balance — a small promotional credit that can expire — so check the billing console for current offers. A typical week of solo coding on V4-Flash costs around $1–2. See is DeepSeek free for the breakdown of free versus paid surfaces.
What’s the difference between deepseek-v4-pro and deepseek-v4-flash for coding?
V4-Flash (284B total / 13B active params) handles autocomplete, boilerplate, and 80% of chat well at $0.14 input miss / $0.28 output per 1M tokens. V4-Pro (1.6T / 49B active) is for multi-file agent runs and architecture work where reliability beats raw cost — about 12× the output price. Detail on each model: DeepSeek V4.
Can I use DeepSeek’s thinking mode in VS Code?
Yes. Set reasoningEffort: high in Continue’s defaultCompletionOptions, or in Cline’s model settings, on either V4 model. The API returns reasoning_content alongside the final content. Use it for hard debugging; skip it for autocomplete where latency matters. The DeepSeek API best practices reference covers when thinking mode is worth the wait.
Why does my autocomplete burn through credit faster than expected?
Autocomplete fires on every cursor pause and resends a chunk of context each time — the API is stateless, so the client packs history into every request. Set debounceDelay to 350 ms or higher, cap maxPromptTokens at 4096, and keep autocomplete on V4-Flash. Track real spend with the DeepSeek pricing calculator.
