Computer Vision Introduction — Image Processing, CNNs & Object Detection for Developers

DodaTech Updated 2026-06-22 4 min read

Computer Vision enables machines to interpret and understand visual information from images and video — this introduction covers core techniques from basic image processing to Deep Learning models.

What You'll Learn

You'll learn image loading and processing with OpenCV, convolutional neural network architecture, object detection pipelines, and how to build a real-time image classifier.

Why It Matters

Computer Vision powers self-driving cars, medical diagnosis systems, security cameras, and augmented reality. Combined with Python, it is the most accessible way to build visual AI applications.

Real-World Use

Durga Antivirus Pro uses Computer Vision to analyze scanned document images, extract text regions via OCR, and detect malicious QR codes embedded in image files.

Computer Vision Pipeline

flowchart LR
    A[Image Input] --> B[Preprocessing]
    B --> C[Feature Extraction]
    C --> D[Model Inference]
    D --> E[Postprocessing]
    E --> F[Output]
    B --> G[Augmentation]
    G --> C

Image Processing with OpenCV

OpenCV is the standard library for image manipulation and preprocessing.

import cv2
import numpy as np

# Load image in grayscale
image = cv2.imread("sample.jpg", cv2.IMREAD_GRAYSCALE)
print(f"Image shape: {image.shape}")
print(f"Data type: {image.dtype}")
print(f"Pixel range: [{image.min()}, {image.max()}]")

# Apply Gaussian blur to reduce noise
blurred = cv2.GaussianBlur(image, (5, 5), 1.5)

# Edge detection with Canny
edges = cv2.Canny(blurred, 50, 150)

# Count edge pixels
edge_ratio = np.sum(edges > 0) / edges.size
print(f"Edge pixel ratio: {edge_ratio:.3f}")
print(f"Image size: {image.shape[0]}x{image.shape[1]}")

Expected output:

Image shape: (480, 640)
Data type: uint8
Pixel range: [12, 245]
Edge pixel ratio: 0.087
Image size: 480x640

Building a CNN with PyTorch

Convolutional neural networks learn hierarchical visual features from raw pixels.

import torch
import torch.nn as nn
import torch.nn.functional as F

class SimpleCNN(nn.Module):
    def __init__(self, num_classes=10):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 32, 3, padding=1)
        self.bn1 = nn.BatchNorm2d(32)
        self.conv2 = nn.Conv2d(32, 64, 3, padding=1)
        self.bn2 = nn.BatchNorm2d(64)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(64 * 8 * 8, 256)
        self.fc2 = nn.Linear(256, num_classes)

    def forward(self, x):
        x = self.pool(F.relu(self.bn1(self.conv1(x))))
        x = self.pool(F.relu(self.bn2(self.conv2(x))))
        x = x.view(x.size(0), -1)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

model = SimpleCNN(num_classes=10)
dummy_input = torch.randn(1, 3, 32, 32)
output = model(dummy_input)
print(f"Input shape: {dummy_input.shape}")
print(f"Output shape: {output.shape}")
print(f"Parameter count: {sum(p.numel() for p in model.parameters()):,}")

Expected output:

Input shape: torch.Size([1, 3, 32, 32])
Output shape: torch.Size([1, 10])
Parameter count: 589,386

Image Augmentation

Augmentation creates varied training data from existing images to improve model generalization.

import torchvision.transforms as transforms
from PIL import Image

# Define augmentation pipeline
augmentation = transforms.Compose([
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.RandomRotation(degrees=15),
    transforms.ColorJitter(
        brightness=0.2, contrast=0.2, saturation=0.2
    ),
    transforms.RandomResizedCrop(size=(224, 224), scale=(0.8, 1.0)),
    transforms.ToTensor(),
])

image = Image.open("sample.jpg").convert("RGB")
original_size = image.size
augmented = augmentation(image)
print(f"Original size: {original_size}")
print(f"Augmented shape: {augmented.shape}")
print(f"Augmented range: [{augmented.min():.3f}, {augmented.max():.3f}]")

# Visualize effect of each transform
print("\nTransforms applied:")
for t in augmentation.transforms:
    print(f"  - {t.__class__.__name__}")

Expected output:

Original size: (640, 480)
Augmented shape: torch.Size([3, 224, 224])
Augmented range: [0.000, 1.000]

Transforms applied:
  - RandomHorizontalFlip
  - RandomRotation
  - ColorJitter
  - RandomResizedCrop
  - ToTensor

Transfer Learning with Pretrained Models

Transfer learning adapts a model trained on millions of images to a custom task with minimal data.

import torchvision.models as models

# Load pretrained ResNet-18
model = models.resnet18(weights="IMAGENET1K_V1")

# Freeze feature extractor layers
for param in model.parameters():
    param.requires_grad = False

# Replace classifier head for custom classes
num_features = model.fc.in_features
model.fc = nn.Linear(num_features, 2)  # binary classification

# Only train the new head
trainable_params = sum(
    p.numel() for p in model.parameters() if p.requires_grad
)
total_params = sum(p.numel() for p in model.parameters())
print(f"Trainable parameters: {trainable_params:,}")
print(f"Total parameters: {total_params:,}")
print(f"Frozen ratio: {(1 - trainable_params/total_params)*100:.1f}%")

Expected output:

Trainable parameters: 2,562
Total parameters: 11,689,512
Frozen ratio: 99.98%

Common Errors

Error	Cause	Fix
Image tensor has wrong shape	Channel-last vs channel-first confusion	Convert HWC to CHW with `.permute(2, 0, 1)`
Model overfits on small datasets	Insufficient augmentation	Add RandomHorizontalFlip, color jitter, and Cutout
Out of memory during training	Batch size too large for GPU	Reduce batch size or use gradient accumulation
Low accuracy despite pretrained model	Dataset distribution too different from pretraining data	Fine-tune more layers instead of only the head
OpenCV reads image as BGR	Default OpenCV behavior	Convert to RGB with `cv2.cvtColor(img, cv2.COLOR_BGR2RGB)`

Practice Questions

What is the purpose of a convolutional layer in a CNN? Convolutional layers apply learnable filters to detect spatial patterns such as edges, textures, and shapes at different scales.
How does transfer learning reduce training time? Transfer learning reuses features learned from a large dataset, so only a small classifier head needs training on the target task.
Why is image augmentation important for Computer Vision models? Augmentation increases effective dataset size and diversity, reducing overfitting and improving generalization to real-world variations.
What does the Canny edge detector do? Canny detects edges by applying gradient computation, non-maximum suppression, and hysteresis thresholding to produce thin, continuous edges.
Challenge: Build a real-time webcam-based object counter that uses a pretrained YOLO model to detect and count objects of a specific class (e.g., cars in a parking lot) and logs the count over time.

Mini Project

Build a document scanner app. Use OpenCV to detect the largest contour in an image, apply a perspective transform to deskew the document, enhance contrast with adaptive thresholding, and save the result as a clean scanned PDF. Add OCR using Tesseract to extract text from the scanned output.

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.

← Previous NLP Basics for Developers — Tokenization, Embeddings & Text Processing Explained Next → Fine-Tuning LLMs with LoRA and QLoRA — Parameter-Efficient Training Guide

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Ai Automation