AI: DeepSeek API: Complete Integration Guide

DeepSeek API: Complete Integration Guide

DodaTech Updated Jun 20, 2026 7 min read

DeepSeek is a Chinese AI company offering powerful open-source and API-accessible LLMs. DeepSeek-R1 (the reasoning model) rivals OpenAI’s o1 at a fraction of the cost, while DeepSeek-V3 provides excellent general-purpose performance. This guide covers API integration, reasoning features, and self-hosting options.

Learning Path

    flowchart LR
  A["LangChain<br/>LLM Applications"] --> B["DeepSeek API<br/>Integration Guide"]
  B --> C["Mistral AI<br/>Models & API"]
  C --> D["Self-Hosting LLMs<br/>Ollama & vLLM"]
  style B fill:#f90,color:#fff,stroke-width:2px

What you’ll learn: DeepSeek API setup, chat completions, DeepSeek-R1 reasoning, code generation, streaming, cost optimization, and self-hosting with Ollama and vLLM. Why it matters: DeepSeek offers GPT-4-class performance at 90% lower cost, with completely open-source models that you can self-host. Real-world use: DodaZIP benchmarks DeepSeek for compression algorithm optimization. Durga Antivirus Pro uses DeepSeek’s code generation for signature pattern development.

API Setup

DeepSeek’s API is OpenAI-compatible — use the same client libraries:

from openai import OpenAI

client = OpenAI(
    api_key="<your-deepseek-api-key>",
    base_url="https://api.deepseek.com"
)

Or use the DeepSeek SDK:

pip install deepseek-sdk

import deepseek

client = deepseek.Client(api_key="<your-deepseek-api-key>")

Chat Completions

from openai import OpenAI

client = OpenAI(
    api_key="sk-...",
    base_url="https://api.deepseek.com"
)

response = client.chat.completions.create(
    model="deepseek-chat",  # or "deepseek-reasoner" for R1
    messages=[
        {"role": "system", "content": "You are a Python expert."},
        {"role": "user", "content": "Write a function to check if a string is a palindrome."}
    ],
    temperature=0.0,
    max_tokens=500
)

print(response.choices[0].message.content)

Expected output:

def is_palindrome(s: str) -> bool:
    """Check if a string is a palindrome (case-insensitive)."""
    cleaned = s.lower().replace(" ", "")
    return cleaned == cleaned[::-1]

# Examples
print(is_palindrome("racecar"))  # True
print(is_palindrome("A man a plan a canal Panama"))  # True
print(is_palindrome("hello"))  # False

DeepSeek-R1: The Reasoning Model

DeepSeek-R1 uses chain-of-thought reasoning before generating answers — similar to OpenAI’s o1:

def deepseek_reason(problem):
    """Use DeepSeek-R1 for complex reasoning tasks."""
    client = OpenAI(api_key="sk-...", base_url="https://api.deepseek.com")
    
    response = client.chat.completions.create(
        model="deepseek-reasoner",
        messages=[
            {"role": "user", "content": problem}
        ],
        temperature=0.6,  # R1 works best with 0.6
        max_tokens=2000
    )
    
    message = response.choices[0].message
    reasoning = getattr(message, "reasoning_content", None)
    
    if reasoning:
        print(f"Reasoning process:\n{reasoning[:300]}...\n")
    
    return message.content

problem = """
A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball.
How much does the ball cost? Think step by step.
"""

result = deepseek_reason(problem)
print(f"Answer: {result}")

Expected output:

Reasoning process:
Let's solve this step by step.
Let the ball cost x dollars.
Then the bat costs x + 1.00 dollars.
Total: x + (x + 1.00) = 1.10
2x + 1.00 = 1.10
2x = 0.10
x = 0.05
The ball costs $0.05 and the bat costs $1.05...

Answer: The ball costs $0.05.

Code Generation

DeepSeek excels at code generation — it’s trained on 2 trillion tokens of code and text:

def generate_code(prompt, language="python"):
    """Generate code using DeepSeek."""
    client = OpenAI(api_key="sk-...", base_url="https://api.deepseek.com")
    
    response = client.chat.completions.create(
        model="deepseek-chat",
        messages=[
            {"role": "system", "content": f"You are an expert {language} developer. Generate clean, well-documented code."},
            {"role": "user", "content": prompt}
        ],
        temperature=0.1,
        max_tokens=1000
    )
    
    return response.choices[0].message.content

code = generate_code("Create a FastAPI endpoint for file upload with virus scanning", "python")
print(code)

API Parameters

DeepSeek’s API parameters mirror OpenAI’s:

Parameter	Type	Default	Notes
`model`	string	required	`deepseek-chat` or `deepseek-reasoner`
`messages`	array	required	Standard chat format
`temperature`	float	0.7	0.0-1.0 (use 0.6 for R1)
`top_p`	float	0.9	Nucleus sampling
`max_tokens`	integer	4096	Max output tokens
`stream`	boolean	false	Enable streaming
`stop`	string/array	null	Stop sequences
`frequency_penalty`	float	0.0	-2.0 to 2.0
`presence_penalty`	float	0.0	-2.0 to 2.0

Streaming

def stream_deepseek(prompt):
    """Stream DeepSeek response."""
    client = OpenAI(api_key="sk-...", base_url="https://api.deepseek.com")
    
    stream = client.chat.completions.create(
        model="deepseek-chat",
        messages=[{"role": "user", "content": prompt}],
        stream=True
    )
    
    for chunk in stream:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)
    print()

stream_deepseek("Write a haiku about programming")

Expected output:

Bugs crawl through the code,
Semicolons mark the path,
Runtime silence grows.

Cost Comparison

def compare_costs(model, input_tokens, output_tokens):
    """Compare DeepSeek vs OpenAI costs."""
    pricing = {
        "deepseek-chat": {"input": 0.14, "output": 0.28},
        "deepseek-reasoner": {"input": 0.55, "output": 2.19},
        "deepseek-chat-cached": {"input": 0.07, "output": 0.28},
        "gpt-4o": {"input": 2.50, "output": 10.00},
        "gpt-4o-mini": {"input": 0.15, "output": 0.60},
        "claude-3-5-sonnet": {"input": 3.00, "output": 15.00},
    }
    
    costs = {}
    for model_name, price in pricing.items():
        total = (input_tokens / 1_000_000 * price["input"] +
                 output_tokens / 1_000_000 * price["output"])
        costs[model_name] = total
    
    return costs

tokens_input, tokens_output = 10000, 2000
costs = compare_costs(tokens_input, tokens_output)

print(f"For {tokens_input:,} input + {tokens_output:,} output tokens:")
for model, cost in sorted(costs.items(), key=lambda x: x[1]):
    print(f"  {model:25} ${cost:.4f}")

Expected output:

For 10,000 input + 2,000 output tokens:
  deepseek-chat-cached        $0.0014
  deepseek-chat              $0.0028
  gpt-4o-mini                  $0.0027
  deepseek-reasoner          $0.0099
  gpt-4o                      $0.0450
  claude-3-5-sonnet           $0.0600

Self-Hosting Options

With Ollama (easiest)

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull DeepSeek model
ollama pull deepseek-coder:6.7b
ollama pull deepseek-r1:7b

# Run
ollama run deepseek-r1:7b

With vLLM (production)

# vLLM serving (uses OpenAI-compatible endpoint)
# Start server:
# python -m vllm.entrypoints.openai.api_server \
#     --model deepseek-ai/deepseek-coder-6.7b-instruct \
#     --port 8000

# Then use any OpenAI client:
from openai import OpenAI

local_client = OpenAI(
    api_key="not-needed",
    base_url="http://localhost:8000/v1"
)

response = local_client.chat.completions.create(
    model="deepseek-ai/deepseek-coder-6.7b-instruct",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

Common Errors

Wrong base_url — DeepSeek uses https://api.deepseek.com (not api.openai.com). Forgetting to change the base URL sends requests to OpenAI instead.
Model name mismatch — deepseek-chat for general chat, deepseek-reasoner for R1 reasoning. Using the wrong model name returns a 404 error.
Temperature too high for code — Code generation needs low temperature (0.0-0.2). High temperature produces creative but incorrect code.
R1 without reasoning extraction — The reasoning content is in message.reasoning_content, not in the regular content. Forgetting to extract it loses valuable chain-of-thought.
Context window exceeded — DeepSeek models have 128K context. Long conversations hit this limit. Implement message truncation or summarization.
Rate limiting — DeepSeek free tier has lower rate limits. Check remaining in response headers. Implement retry with backoff for production.
Self-hosted model quantization errors — Running 67B models requires significant VRAM. Use 4-bit quantization or the smaller 7B/14B versions for local testing.

Practice Questions

1. What’s the main difference between deepseek-chat and deepseek-reasoner? deepseek-reasoner (DeepSeek-R1) shows its chain-of-thought reasoning process before answering. deepseek-chat (DeepSeek-V3) directly generates responses without visible reasoning.

2. How much cheaper is DeepSeek compared to GPT-4o? DeepSeek-chat costs $0.14/M input tokens vs GPT-4o’s $2.50/M — approximately 18x cheaper. DeepSeek-reasoner costs $0.55/M input, still 4.5x cheaper than GPT-4o.

3. Can I use DeepSeek with OpenAI client libraries? Yes. DeepSeek’s API is fully OpenAI-compatible. Just change the base_url to https://api.deepseek.com and use your DeepSeek API key.

4. How do I self-host DeepSeek models? Use Ollama (easiest, for 7B-14B models), vLLM (production, supports all sizes), or llama.cpp (for CPU/quantized inference).

5. Challenge: Build a DeepSeek-powered code reviewer Create a Python script that takes a file path, reads the code, and sends it to DeepSeek for review. Use the reasoning model to get step-by-step analysis of potential bugs and security issues.

Mini Project: Multi-Provider LLM Benchmark

def benchmark_providers(prompt):
    """Compare responses from different LLM providers."""
    import time
    
    providers = {
        "DeepSeek Chat": {
            "base_url": "https://api.deepseek.com",
            "api_key_env": "DEEPSEEK_API_KEY"
        },
    }
    
    results = []
    for name, config in providers.items():
        try:
            client = OpenAI(
                api_key=os.getenv(config["api_key_env"]),
                base_url=config["base_url"]
            )
            
            start = time.time()
            response = client.chat.completions.create(
                model="deepseek-chat",
                messages=[{"role": "user", "content": prompt}],
                max_tokens=200
            )
            elapsed = time.time() - start
            
            results.append({
                "provider": name,
                "response": response.choices[0].message.content,
                "latency": f"{elapsed:.2f}s",
                "tokens": response.usage.total_tokens
            })
        except Exception as e:
            results.append({"provider": name, "error": str(e)})
    
    return results

# results = benchmark_providers("Explain microservices in 3 sentences.")
# for r in results:
#     print(f"{r['provider']}: {r.get('latency', 'ERROR')}")

FAQ

Is DeepSeek truly open-source?

Yes. DeepSeek-V2, DeepSeek-Coder, and DeepSeek-R1 are all released under permissive licenses (MIT or Apache 2.0). Model weights, training code, and technical papers are publicly available.

How does DeepSeek-R1 compare to OpenAI o1?

DeepSeek-R1 matches or exceeds o1 on math reasoning (AIME, MATH benchmarks) and coding (LiveCodeBench) while being significantly cheaper. OpenAI o1 has stronger general knowledge and broader safety alignment.

Can I use DeepSeek for commercial applications?

Yes. DeepSeek models are released under MIT/Apache 2.0 licenses. The API service has its own terms of service — review them for your use case.

Is DeepSeek available through other providers?

Yes. DeepSeek models are available on Together AI, Fireworks AI, Groq, and other inference providers. Some offer lower latency or different pricing than the official API.

What hardware do I need to run DeepSeek locally?

The 7B model runs on consumer GPUs (8GB+ VRAM). The 67B model requires 80GB+ VRAM or quantized versions on multi-GPU setups. Use Ollama for easy local setup.

DeepSeek API: Complete Integration Guide

Learning Path

API Setup

Chat Completions

DeepSeek-R1: The Reasoning Model

Code Generation

API Parameters

Streaming

Cost Comparison

Self-Hosting Options

With Ollama (easiest)

With vLLM (production)

Common Errors

Practice Questions

Mini Project: Multi-Provider LLM Benchmark

FAQ

Related Tutorials