DeepSeek API: Complete Integration Guide
DeepSeek is a Chinese AI company offering powerful open-source and API-accessible LLMs. DeepSeek-R1 (the reasoning model) rivals OpenAI’s o1 at a fraction of the cost, while DeepSeek-V3 provides excellent general-purpose performance. This guide covers API integration, reasoning features, and self-hosting options.
Learning Path
flowchart LR
A["LangChain<br/>LLM Applications"] --> B["DeepSeek API<br/>Integration Guide"]
B --> C["Mistral AI<br/>Models & API"]
C --> D["Self-Hosting LLMs<br/>Ollama & vLLM"]
style B fill:#f90,color:#fff,stroke-width:2px
API Setup
DeepSeek’s API is OpenAI-compatible — use the same client libraries:
from openai import OpenAI
client = OpenAI(
api_key="<your-deepseek-api-key>",
base_url="https://api.deepseek.com"
)Or use the DeepSeek SDK:
pip install deepseek-sdkimport deepseek
client = deepseek.Client(api_key="<your-deepseek-api-key>")Chat Completions
from openai import OpenAI
client = OpenAI(
api_key="sk-...",
base_url="https://api.deepseek.com"
)
response = client.chat.completions.create(
model="deepseek-chat", # or "deepseek-reasoner" for R1
messages=[
{"role": "system", "content": "You are a Python expert."},
{"role": "user", "content": "Write a function to check if a string is a palindrome."}
],
temperature=0.0,
max_tokens=500
)
print(response.choices[0].message.content)Expected output:
def is_palindrome(s: str) -> bool:
"""Check if a string is a palindrome (case-insensitive)."""
cleaned = s.lower().replace(" ", "")
return cleaned == cleaned[::-1]
# Examples
print(is_palindrome("racecar")) # True
print(is_palindrome("A man a plan a canal Panama")) # True
print(is_palindrome("hello")) # FalseDeepSeek-R1: The Reasoning Model
DeepSeek-R1 uses chain-of-thought reasoning before generating answers — similar to OpenAI’s o1:
def deepseek_reason(problem):
"""Use DeepSeek-R1 for complex reasoning tasks."""
client = OpenAI(api_key="sk-...", base_url="https://api.deepseek.com")
response = client.chat.completions.create(
model="deepseek-reasoner",
messages=[
{"role": "user", "content": problem}
],
temperature=0.6, # R1 works best with 0.6
max_tokens=2000
)
message = response.choices[0].message
reasoning = getattr(message, "reasoning_content", None)
if reasoning:
print(f"Reasoning process:\n{reasoning[:300]}...\n")
return message.content
problem = """
A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball.
How much does the ball cost? Think step by step.
"""
result = deepseek_reason(problem)
print(f"Answer: {result}")Expected output:
Reasoning process:
Let's solve this step by step.
Let the ball cost x dollars.
Then the bat costs x + 1.00 dollars.
Total: x + (x + 1.00) = 1.10
2x + 1.00 = 1.10
2x = 0.10
x = 0.05
The ball costs $0.05 and the bat costs $1.05...
Answer: The ball costs $0.05.Code Generation
DeepSeek excels at code generation — it’s trained on 2 trillion tokens of code and text:
def generate_code(prompt, language="python"):
"""Generate code using DeepSeek."""
client = OpenAI(api_key="sk-...", base_url="https://api.deepseek.com")
response = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "system", "content": f"You are an expert {language} developer. Generate clean, well-documented code."},
{"role": "user", "content": prompt}
],
temperature=0.1,
max_tokens=1000
)
return response.choices[0].message.content
code = generate_code("Create a FastAPI endpoint for file upload with virus scanning", "python")
print(code)API Parameters
DeepSeek’s API parameters mirror OpenAI’s:
| Parameter | Type | Default | Notes |
|---|---|---|---|
model | string | required | deepseek-chat or deepseek-reasoner |
messages | array | required | Standard chat format |
temperature | float | 0.7 | 0.0-1.0 (use 0.6 for R1) |
top_p | float | 0.9 | Nucleus sampling |
max_tokens | integer | 4096 | Max output tokens |
stream | boolean | false | Enable streaming |
stop | string/array | null | Stop sequences |
frequency_penalty | float | 0.0 | -2.0 to 2.0 |
presence_penalty | float | 0.0 | -2.0 to 2.0 |
Streaming
def stream_deepseek(prompt):
"""Stream DeepSeek response."""
client = OpenAI(api_key="sk-...", base_url="https://api.deepseek.com")
stream = client.chat.completions.create(
model="deepseek-chat",
messages=[{"role": "user", "content": prompt}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
print()
stream_deepseek("Write a haiku about programming")Expected output:
Bugs crawl through the code,
Semicolons mark the path,
Runtime silence grows.Cost Comparison
def compare_costs(model, input_tokens, output_tokens):
"""Compare DeepSeek vs OpenAI costs."""
pricing = {
"deepseek-chat": {"input": 0.14, "output": 0.28},
"deepseek-reasoner": {"input": 0.55, "output": 2.19},
"deepseek-chat-cached": {"input": 0.07, "output": 0.28},
"gpt-4o": {"input": 2.50, "output": 10.00},
"gpt-4o-mini": {"input": 0.15, "output": 0.60},
"claude-3-5-sonnet": {"input": 3.00, "output": 15.00},
}
costs = {}
for model_name, price in pricing.items():
total = (input_tokens / 1_000_000 * price["input"] +
output_tokens / 1_000_000 * price["output"])
costs[model_name] = total
return costs
tokens_input, tokens_output = 10000, 2000
costs = compare_costs(tokens_input, tokens_output)
print(f"For {tokens_input:,} input + {tokens_output:,} output tokens:")
for model, cost in sorted(costs.items(), key=lambda x: x[1]):
print(f" {model:25} ${cost:.4f}")Expected output:
For 10,000 input + 2,000 output tokens:
deepseek-chat-cached $0.0014
deepseek-chat $0.0028
gpt-4o-mini $0.0027
deepseek-reasoner $0.0099
gpt-4o $0.0450
claude-3-5-sonnet $0.0600Self-Hosting Options
With Ollama (easiest)
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull DeepSeek model
ollama pull deepseek-coder:6.7b
ollama pull deepseek-r1:7b
# Run
ollama run deepseek-r1:7bWith vLLM (production)
# vLLM serving (uses OpenAI-compatible endpoint)
# Start server:
# python -m vllm.entrypoints.openai.api_server \
# --model deepseek-ai/deepseek-coder-6.7b-instruct \
# --port 8000
# Then use any OpenAI client:
from openai import OpenAI
local_client = OpenAI(
api_key="not-needed",
base_url="http://localhost:8000/v1"
)
response = local_client.chat.completions.create(
model="deepseek-ai/deepseek-coder-6.7b-instruct",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)Common Errors
- Wrong base_url — DeepSeek uses
https://api.deepseek.com(notapi.openai.com). Forgetting to change the base URL sends requests to OpenAI instead. - Model name mismatch —
deepseek-chatfor general chat,deepseek-reasonerfor R1 reasoning. Using the wrong model name returns a 404 error. - Temperature too high for code — Code generation needs low temperature (0.0-0.2). High temperature produces creative but incorrect code.
- R1 without reasoning extraction — The reasoning content is in
message.reasoning_content, not in the regular content. Forgetting to extract it loses valuable chain-of-thought. - Context window exceeded — DeepSeek models have 128K context. Long conversations hit this limit. Implement message truncation or summarization.
- Rate limiting — DeepSeek free tier has lower rate limits. Check
remainingin response headers. Implement retry with backoff for production. - Self-hosted model quantization errors — Running 67B models requires significant VRAM. Use 4-bit quantization or the smaller 7B/14B versions for local testing.
Practice Questions
1. What’s the main difference between deepseek-chat and deepseek-reasoner?
deepseek-reasoner (DeepSeek-R1) shows its chain-of-thought reasoning process before answering. deepseek-chat (DeepSeek-V3) directly generates responses without visible reasoning.
2. How much cheaper is DeepSeek compared to GPT-4o? DeepSeek-chat costs $0.14/M input tokens vs GPT-4o’s $2.50/M — approximately 18x cheaper. DeepSeek-reasoner costs $0.55/M input, still 4.5x cheaper than GPT-4o.
3. Can I use DeepSeek with OpenAI client libraries?
Yes. DeepSeek’s API is fully OpenAI-compatible. Just change the base_url to https://api.deepseek.com and use your DeepSeek API key.
4. How do I self-host DeepSeek models? Use Ollama (easiest, for 7B-14B models), vLLM (production, supports all sizes), or llama.cpp (for CPU/quantized inference).
5. Challenge: Build a DeepSeek-powered code reviewer Create a Python script that takes a file path, reads the code, and sends it to DeepSeek for review. Use the reasoning model to get step-by-step analysis of potential bugs and security issues.
Mini Project: Multi-Provider LLM Benchmark
def benchmark_providers(prompt):
"""Compare responses from different LLM providers."""
import time
providers = {
"DeepSeek Chat": {
"base_url": "https://api.deepseek.com",
"api_key_env": "DEEPSEEK_API_KEY"
},
}
results = []
for name, config in providers.items():
try:
client = OpenAI(
api_key=os.getenv(config["api_key_env"]),
base_url=config["base_url"]
)
start = time.time()
response = client.chat.completions.create(
model="deepseek-chat",
messages=[{"role": "user", "content": prompt}],
max_tokens=200
)
elapsed = time.time() - start
results.append({
"provider": name,
"response": response.choices[0].message.content,
"latency": f"{elapsed:.2f}s",
"tokens": response.usage.total_tokens
})
except Exception as e:
results.append({"provider": name, "error": str(e)})
return results
# results = benchmark_providers("Explain microservices in 3 sentences.")
# for r in results:
# print(f"{r['provider']}: {r.get('latency', 'ERROR')}")FAQ
Related Tutorials
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro. Updated 2026-06-20.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro