Skip to content
OpenAI API Guide — GPT-4, DALL-E, and Whisper Integration

OpenAI API Guide — GPT-4, DALL-E, and Whisper Integration

DodaTech Updated Jun 7, 2026 7 min read

The OpenAI API provides access to GPT-4, DALL-E, and Whisper models for text generation, image creation, and speech-to-text in a single unified interface.

What You’ll Learn

  • How to obtain and secure API keys for OpenAI services
  • Building chat completions with GPT-4 including streaming and function calling
  • Generating images with DALL-E 3 and converting speech with Whisper
  • Managing tokens, understanding pricing, and handling rate limits

Why the OpenAI API Matters

OpenAI’s models power millions of applications worldwide. From customer support chatbots to code assistants and content generation tools, the API is the most widely adopted AI integration point. DodaTech’s Doda Browser uses GPT-4 for inline page summarization, and Durga Antivirus Pro leverages embeddings for semantic malware signature matching — making OpenAI API skills essential for modern developers.

    flowchart LR
    A["API Key\n& Authentication"] --> B["Chat Completions\nGPT-4"]
    A --> C["Images\nDALL-E 3"]
    A --> D["Audio\nWhisper"]
    A --> E["Embeddings\ntext-embedding-3"]
    B --> F["Streaming &\nFunction Calling"]
    B --> G["Token Counting\n& Pricing"]
    style B fill:#dbeafe,stroke:#2563eb
  

Getting Started with API Keys

Every OpenAI API call requires an API key. Sign up at platform.openai.com, navigate to API keys, and create a new secret key. Store it as an environment variable — never hardcode keys in source code.

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

Expected output: No output — the client initializes silently. If OPENAI_API_KEY is missing, Python raises KeyError.

Chat Completions with GPT-4

The chat completions endpoint is the core of OpenAI’s text generation. You send a list of messages and receive a model response.

response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "You are a helpful Python tutor."},
        {"role": "user", "content": "Explain list comprehensions in one sentence."}
    ]
)
print(response.choices[0].message.content)

Expected output:

A list comprehension is a concise way to create lists by applying an expression to each item in an iterable, optionally filtering with a condition.

Streaming Responses

For real-time applications, stream tokens as they arrive instead of waiting for the full response.

stream = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Count from 1 to 5."}],
    stream=True
)
for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

Expected output:

1, 2, 3, 4, 5

The stream yields delta objects containing partial content. This pattern is used in Doda Browser’s live page summarization feature where text appears progressively as the model generates it.

Function Calling

Function calling lets GPT-4 request structured data from your application. Define a function schema and the model will output a JSON object when it needs to call that function.

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current temperature for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string"},
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
                },
                "required": ["city"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "What's the weather in London?"}],
    tools=tools,
    tool_choice="auto"
)
print(response.choices[0].message.tool_calls[0].function)

Expected output:

Function(name='get_weather', arguments='{"city":"London","unit":"celsius"}')

The model decides when to call the function and returns structured arguments you can execute against your own data sources. Durga Antivirus Pro uses this pattern to let GPT-4 query internal threat databases when analyzing security incidents.

DALL-E 3 Image Generation

Generate images from text descriptions using DALL-E 3.

image = client.images.generate(
    model="dall-e-3",
    prompt="A futuristic city skyline at sunset with flying cars",
    size="1024x1024",
    quality="standard",
    n=1
)
print(image.data[0].url)

Expected output: A URL string pointing to the generated image, valid for approximately one hour.

Whisper Speech-to-Text

Transcribe audio files into text using the Whisper model.

audio_file = open("meeting_recording.mp3", "rb")
transcript = client.audio.transcriptions.create(
    model="whisper-1",
    file=audio_file
)
print(transcript.text)

Expected output: The transcribed text from the audio file. Whisper supports MP3, WAV, M4A, and other common formats.

Embeddings for Semantic Search

Embeddings convert text into vector representations for semantic search and clustering.

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="Doda Browser is a fast and private web browser"
)
vector = response.data[0].embedding
print(f"Vector dimension: {len(vector)}")
print(f"First 5 values: {vector[:5]}")

Expected output:

Vector dimension: 1536
First 5 values: [-0.008327245, 0.02189445, -0.001234567, 0.03456789, -0.01234567]

Durga Antivirus Pro uses embeddings to compare malware signatures semantically — finding threats that match the intent of known patterns rather than exact byte sequences.

Token Counting and Pricing

OpenAI charges per token. Use tiktoken to count tokens before sending requests.

import tiktoken

encoding = tiktoken.encoding_for_model("gpt-4")
tokens = encoding.encode("DodaZIP compresses files efficiently.")
print(f"Token count: {len(tokens)}")
print(f"Tokens: {[encoding.decode([t]) for t in tokens]}")

Expected output:

Token count: 6
Tokens: ['D', 'oda', 'Z', 'IP', ' compresses', ' files efficiently.']

Common Errors

1. InsufficientQuota Error

You’ve exhausted your usage tier or billing limit. Check your usage at platform.openai.com/account/usage and add credit or raise limits.

2. RateLimitError

Sending requests too fast. OpenAI imposes tiered rate limits (e.g., 500 RPM for Tier 1). Implement exponential backoff with tenacity or similar retry libraries.

3. AuthenticationError

The API key is invalid, missing, or revoked. Verify OPENAI_API_KEY is set correctly and hasn’t been rotated.

4. InvalidRequestError — Context Length Exceeded

Your prompt plus response exceeds the model’s context window. GPT-4 has 8K, 32K, and 128K variants. Truncate messages or switch to a larger context model.

5. Model Not Found

The model name is incorrect or you lack access. gpt-4 requires an approved access request. Use gpt-3.5-turbo as fallback.

6. Timeout Error

The request took longer than your timeout setting. For long generations, use the timeout parameter or switch to streaming.

7. Content Policy Violation

The prompt or generated output triggered OpenAI’s content filter. Review the safety guidelines and adjust your prompt.

Practice Questions

  1. What environment variable should hold your OpenAI API key?
  2. How does streaming differ from standard chat completions?
  3. What is the purpose of function calling in GPT-4?
  4. Which model should you use for semantic search embeddings?
  5. How do you count tokens for a GPT-4 request before sending it?

Answers:

  1. OPENAI_API_KEY — never hardcode keys in source files.
  2. Streaming returns tokens incrementally via delta objects, enabling real-time display without waiting for the full response.
  3. Function calling lets the model output structured JSON to invoke external tools or APIs, connecting GPT-4 to your own data sources.
  4. text-embedding-3-small (1536 dimensions) or text-embedding-3-large (3072 dimensions) for higher precision.
  5. Use tiktoken.encoding_for_model("gpt-4").encode(text) to get the token count.

Challenge: DodaZIP needs a feature that summarizes compressed file contents using GPT-4. Design a function that reads a file, truncates it to fit the context window, sends it to the API, and returns the summary with token usage statistics.

Mini Project: AI-Powered Chat Assistant

Build a CLI chat assistant that streams GPT-4 responses and saves conversations:

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
messages = [{"role": "system", "content": "You are a helpful assistant."}]

print("AI Chat Assistant (type 'quit' to exit)")
while True:
    user = input("\nYou: ")
    if user.lower() == "quit":
        break
    messages.append({"role": "user", "content": user})
    stream = client.chat.completions.create(
        model="gpt-4", messages=messages, stream=True
    )
    print("Assistant: ", end="")
    reply = ""
    for chunk in stream:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="")
            reply += chunk.choices[0].delta.content
    messages.append({"role": "assistant", "content": reply})

Try it: Run the script and have a conversation. The assistant remembers context within the session. Extend it by adding a save/load feature to persist conversations across sessions using JSON file storage.

FAQ

What is the difference between GPT-4 and GPT-3.5?
GPT-4 is more capable across reasoning, creativity, and nuanced instruction-following, but costs 10-20x more per token. GPT-3.5 is faster and cheaper for simpler tasks. Choose based on your accuracy vs cost trade-off.
How do I handle rate limits in production?
Implement exponential backoff with jitter using libraries like tenacity or backoff. Queue requests and process them within your tier’s rate limit. Upgrade your OpenAI tier for higher limits.
Can I fine-tune GPT-4?
OpenAI offers fine-tuning for GPT-3.5 and GPT-4 base models. Fine-tuning improves performance on domain-specific tasks. Prepare a training dataset in JSONL format with prompt-completion pairs and use the fine-tuning API.
How are tokens counted for images in DALL-E?
DALL-E 3 generates images at fixed resolutions (1024x1024, 1792x1024, etc.). Pricing is per image, not per token. Each generation costs between $0.040 and $0.080 depending on resolution and quality.
What is the maximum audio file size for Whisper?
Whisper supports files up to 25 MB. For longer recordings, split the audio into chunks before transcription. Use pydub to segment audio programmatically.

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro