PyTorch Guide — Deep Learning Framework for Research and Production
PyTorch is an open-source deep learning framework developed by Meta AI that combines flexible tensor computation with automatic differentiation, making it the preferred choice for both research experimentation and production deployment.
What You’ll Learn
You’ll work with tensors (PyTorch’s multi-dimensional arrays), use autograd for automatic gradient computation, build neural network architectures with nn.Module, implement custom training loops, load data with DataLoader, accelerate training on GPUs with CUDA, use torchvision for image tasks, and save and load model checkpoints.
Why PyTorch Matters
PyTorch has become the dominant deep learning framework in research, powering most papers published at top AI conferences (NeurIPS, ICML, CVPR). Its imperative, Pythonic style (“define-by-run”) makes debugging intuitive — you can use standard Python debuggers. DodaTech’s malware classification system uses PyTorch because its dynamic computation graphs let us handle variable-length executable files, and TorchScript lets us deploy the trained model directly into Durga Antivirus Pro without a separate runtime.
PyTorch Learning Path
flowchart LR
A[Python & NumPy Basics] --> B[PyTorch]
B --> C[Tensors]
C --> D[Autograd]
D --> E[Neural Networks]
E --> F[Training Loops]
F --> G[Data Loading]
G --> H[GPU Acceleration]
H --> I[Model Deployment]
B:::current
classDef current fill:#EE4C2C,color:#fff,stroke:#333,stroke-width:2px
Tensors — The Core Data Structure
Tensors are multi-dimensional arrays similar to NumPy arrays but with GPU acceleration and automatic differentiation:
import torch
import numpy as np
# Creating tensors
data = [[1, 2], [3, 4], [5, 6]]
tensor = torch.tensor(data)
print(f"Tensor:\n{tensor}")
# Output:
# tensor([[1, 2],
# [3, 4],
# [5, 6]])
# Tensor from NumPy
np_array = np.array([1.0, 2.0, 3.0])
from_numpy = torch.from_numpy(np_array)
print(from_numpy) # tensor([1., 2., 3.])
# Special tensors
zeros = torch.zeros(2, 3)
ones = torch.ones(2, 3)
random = torch.randn(2, 3) # Standard normal distribution
eye = torch.eye(4) # Identity matrix
print(f"Random tensor:\n{random}")
# Output:
# tensor([[-0.3421, 1.2034, -0.8765],
# [ 0.5432, -1.2345, 0.9876]])
# Tensor operations
x = torch.tensor([1.0, 2.0, 3.0])
y = torch.tensor([4.0, 5.0, 6.0])
print(f"Add: {x + y}") # tensor([5., 7., 9.])
print(f"Multiply: {x * y}") # tensor([4., 10., 18.])
print(f"Dot: {torch.dot(x, y)}") # tensor(32.)
# Reshaping
matrix = torch.arange(12).reshape(3, 4)
print(matrix)
# Output:
# tensor([[ 0, 1, 2, 3],
# [ 4, 5, 6, 7],
# [ 8, 9, 10, 11]])Autograd — Automatic Differentiation
Autograd records operations on tensors and computes gradients automatically:
# Simple gradient computation
x = torch.tensor(3.0, requires_grad=True)
y = torch.tensor(2.0, requires_grad=True)
# Define a function: z = (x^2 * y) + (x * y^2)
z = (x ** 2) * y + x * (y ** 2)
# Compute gradients
z.backward()
print(f"dz/dx at x=3, y=2: {x.grad}")
print(f"dz/dy at x=3, y=2: {y.grad}")
# Manual verification:
# z = x^2*y + x*y^2
# dz/dx = 2*x*y + y^2 = 2*3*2 + 4 = 16
# dz/dy = x^2 + 2*x*y = 9 + 12 = 21
# Output:
# dz/dx at x=3, y=2: tensor(16.)
# dz/dy at x=3, y=2: tensor(21.)
# Training loop pattern
weights = torch.randn(3, 1, requires_grad=True)
inputs = torch.randn(10, 3)
targets = 3 * inputs[:, 0:1] + 2 * inputs[:, 1:2] + 1 * inputs[:, 2:3]
# Forward pass
predictions = inputs @ weights # Matrix multiplication
loss = ((predictions - targets) ** 2).mean()
# Backward pass
loss.backward()
# Gradients are now populated
print(f"Gradients: {weights.grad.shape}")Output: backward() computes the gradient of z with respect to all tensors that have requires_grad=True. The gradients are accumulated in the .grad attribute. Calling backward() again would add to the existing gradients (use zero_grad() between training steps).
Building Neural Networks with nn.Module
PyTorch’s nn.Module provides a structured way to build neural networks:
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
class MalwareClassifier(nn.Module):
def __init__(self, input_size, hidden_size, num_classes):
super().__init__()
# Define layers
self.fc1 = nn.Linear(input_size, hidden_size)
self.fc2 = nn.Linear(hidden_size, hidden_size)
self.fc3 = nn.Linear(hidden_size, num_classes)
# Regularization
self.dropout = nn.Dropout(0.3)
self.batch_norm = nn.BatchNorm1d(hidden_size)
def forward(self, x):
# Define forward pass
x = F.relu(self.batch_norm(self.fc1(x)))
x = self.dropout(x)
x = F.relu(self.fc2(x))
x = self.dropout(x)
x = self.fc3(x)
return x
# Instantiate the model
model = MalwareClassifier(
input_size=512, # Feature vector size
hidden_size=256, # Hidden layer neurons
num_classes=5 # Malware families
)
print(model)
# Output:
# MalwareClassifier(
# (fc1): Linear(in_features=512, out_features=256, bias=True)
# (fc2): Linear(in_features=256, out_features=256, bias=True)
# (fc3): Linear(in_features=256, out_features=5, bias=True)
# (dropout): Dropout(p=0.3, inplace=False)
# (batch_norm): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True)
# )
# Test forward pass
sample_input = torch.randn(32, 512) # Batch of 32
output = model(sample_input)
print(f"Output shape: {output.shape}") # torch.Size([32, 5])Training Loop — Full Pipeline
A complete training and validation loop:
# Setup
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = MalwareClassifier(512, 256, 5).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, patience=3)
# Training loop
num_epochs = 10
train_losses = []
val_accuracies = []
for epoch in range(num_epochs):
# Training phase
model.train()
running_loss = 0.0
for batch_idx, (data, targets) in enumerate(train_loader):
data, targets = data.to(device), targets.to(device)
# Zero gradients
optimizer.zero_grad()
# Forward pass
outputs = model(data)
loss = criterion(outputs, targets)
# Backward pass and optimize
loss.backward()
# Gradient clipping (prevents exploding gradients)
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
optimizer.step()
running_loss += loss.item()
avg_train_loss = running_loss / len(train_loader)
train_losses.append(avg_train_loss)
# Validation phase
model.eval()
correct = 0
total = 0
with torch.no_grad():
for data, targets in val_loader:
data, targets = data.to(device), targets.to(device)
outputs = model(data)
_, predicted = torch.max(outputs.data, 1)
total += targets.size(0)
correct += (predicted == targets).sum().item()
accuracy = 100.0 * correct / total
val_accuracies.append(accuracy)
print(f"Epoch [{epoch+1}/{num_epochs}] "
f"Loss: {avg_train_loss:.4f} "
f"Val Acc: {accuracy:.2f}%")
# Learning rate scheduling
scheduler.step(avg_train_loss)
# Output:
# Epoch [1/10] Loss: 1.2345 Val Acc: 45.67%
# Epoch [2/10] Loss: 0.9876 Val Acc: 58.23%
# ...
# Epoch [10/10] Loss: 0.3456 Val Acc: 92.15%Data Loading with DataLoader
Efficient data loading with batching, shuffling, and parallel loading:
from torch.utils.data import Dataset, DataLoader
import pandas as pd
class MalwareDataset(Dataset):
def __init__(self, csv_file, transform=None):
self.data = pd.read_csv(csv_file)
self.transform = transform
# Assume last column is the label
self.features = self.data.iloc[:, :-1].values.astype('float32')
self.labels = self.data.iloc[:, -1].values.astype('int64')
def __len__(self):
return len(self.data)
def __getitem__(self, idx):
features = torch.tensor(self.features[idx])
label = torch.tensor(self.labels[idx])
if self.transform:
features = self.transform(features)
return features, label
# Create datasets
train_dataset = MalwareDataset('train_features.csv')
val_dataset = MalwareDataset('val_features.csv')
# Create data loaders
train_loader = DataLoader(
train_dataset,
batch_size=64,
shuffle=True,
num_workers=4, # Parallel loading
pin_memory=True, # Faster GPU transfer
drop_last=True # Drop incomplete batch
)
val_loader = DataLoader(
val_dataset,
batch_size=64,
shuffle=False,
num_workers=4
)
print(f"Training samples: {len(train_dataset)}")
print(f"Validation samples: {len(val_dataset)}")
print(f"Batches per epoch: {len(train_loader)}")Output: DataLoader handles batching, shuffling, and multi-process loading automatically. Training data is shuffled each epoch; validation data stays in order. pin_memory=True speeds up GPU data transfer.
GPU Acceleration with CUDA
PyTorch makes GPU usage straightforward:
# Check GPU availability
if torch.cuda.is_available():
print(f"GPU: {torch.cuda.get_device_name(0)}")
print(f"Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
else:
print("No GPU available, using CPU")
# Move tensors to GPU
cpu_tensor = torch.randn(1000, 1000)
gpu_tensor = cpu_tensor.to('cuda')
# Move model to GPU
model = MalwareClassifier(512, 256, 5).to('cuda')
# Benchmark
import time
def benchmark(device, size=1000):
a = torch.randn(size, size, device=device)
b = torch.randn(size, size, device=device)
# Warm up
for _ in range(10):
torch.mm(a, b)
# Time 100 matrix multiplications
torch.cuda.synchronize() if device == 'cuda' else None
start = time.time()
for _ in range(100):
torch.mm(a, b)
torch.cuda.synchronize() if device == 'cuda' else None
end = time.time()
return (end - start) / 100
cpu_time = benchmark('cpu')
gpu_time = benchmark('cuda') if torch.cuda.is_available() else float('inf')
print(f"CPU: {cpu_time*1000:.2f}ms | GPU: {gpu_time*1000:.2f}ms | "
f"Speedup: {cpu_time/gpu_time:.1f}x")torchvision — Computer Vision
import torchvision
import torchvision.transforms as transforms
import torchvision.models as models
# Data transformations
transform = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406], # ImageNet mean
std=[0.229, 0.224, 0.225] # ImageNet std
)
])
# Load CIFAR-10 dataset
train_set = torchvision.datasets.CIFAR10(
root='./data', train=True,
download=True, transform=transform
)
train_loader = DataLoader(train_set, batch_size=64, shuffle=True)
# Use a pre-trained model
resnet = models.resnet18(pretrained=True)
# Freeze feature extractor layers
for param in resnet.parameters():
param.requires_grad = False
# Replace the classifier for our task (10 classes)
resnet.fc = nn.Linear(512, 10)
# Only train the classifier head
optimizer = optim.Adam(resnet.fc.parameters(), lr=0.001)Saving and Loading Models
# Save model checkpoint
checkpoint = {
'epoch': 10,
'model_state_dict': model.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
'loss': avg_train_loss,
'accuracy': accuracy
}
torch.save(checkpoint, 'malware_classifier_checkpoint.pth')
# Load model checkpoint
model = MalwareClassifier(512, 256, 5)
optimizer = optim.Adam(model.parameters())
checkpoint = torch.load('malware_classifier_checkpoint.pth')
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']
print(f"Resumed from epoch {epoch}, loss: {loss:.4f}")
# Save for inference (smaller, no optimizer state)
torch.save(model.state_dict(), 'malware_classifier_final.pth')
# Load for inference
model = MalwareClassifier(512, 256, 5)
model.load_state_dict(torch.load('malware_classifier_final.pth'))
model.eval()
# Export to TorchScript for production deployment
scripted_model = torch.jit.script(model)
scripted_model.save('malware_classifier_scripted.pt')Security Angle: Adversarial Robustness
def detect_adversarial_input(model, sample, epsilon=0.01):
"""Detect potential adversarial examples using prediction stability"""
model.eval()
with torch.no_grad():
original_pred = model(sample.unsqueeze(0))
original_prob = F.softmax(original_pred, dim=1)
predictions = []
for _ in range(10):
noise = torch.randn_like(sample) * epsilon
noisy_sample = sample + noise
with torch.no_grad():
noisy_pred = model(noisy_sample.unsqueeze(0))
predictions.append(F.softmax(noisy_pred, dim=1))
# Check prediction stability
predictions = torch.cat(predictions)
std = predictions.std(dim=0)
if std.max() > 0.1: # High variance indicates potential adversarial
print(f"Warning: Unstable prediction (std={std.max():.4f})")
return False
return TrueDodaTech uses similar adversarial detection in Durga Antivirus Pro to identify malware samples that attempt to evade neural network classifiers.
Common Mistakes Beginners Make
Forgetting
model.train()andmodel.eval(): Dropout and batch normalization behave differently during training and evaluation. Forgetting to toggle modes causes incorrect validation results.Not calling
optimizer.zero_grad(): Gradients accumulate by default. Without zeroing, each backward pass adds to existing gradients, leading to incorrect updates.Detaching tensors from the computation graph accidentally: Converting a tensor to NumPy with
.numpy()before detaching causes errors. Usetensor.detach().numpy()when you don’t need gradients.CPU-GPU data transfer bottleneck: Moving data between CPU and GPU is slow. Move data to GPU once per batch, not per operation. Use
pin_memory=Truein DataLoader.Overfitting without regularization: Deep networks easily memorize training data. Use dropout, weight decay, data augmentation, and early stopping.
Not shuffling training data: Without shuffling, batches may contain correlated samples, harming convergence. Always
shuffle=Truein training DataLoader.
Practice Questions
- What does
requires_grad=Truedo? - How does
nn.Modulesimplify neural network construction? - What is the purpose of
model.train()andmodel.eval()? - How do you move tensors and models between CPU and GPU?
- Why save optimizer state_dict in checkpoints?
Answers:
- It tells PyTorch to track operations on the tensor for automatic gradient computation via
backward(). Only tensors withrequires_grad=Trueaccumulate gradients. nn.Moduleprovides automatic parameter tracking,to(device)for device management,state_dict()for serialization, andtrain()/eval()mode switching.model.train()enables dropout and batch normalization training behavior.model.eval()disables them for inference, giving deterministic results.- Use
tensor.to('cuda')ormodel.to('cuda'). Check availability withtorch.cuda.is_available(). Always move data to the same device as the model. - Saving optimizer state lets you resume training exactly where you left off, preserving learning rate schedules, momentum buffers, and adaptive learning rate states.
Challenge
Build a complete image classifier for the CIFAR-100 dataset: create a custom CNN with convolutional and pooling layers, implement data augmentation (random crop, horizontal flip, color jitter), use learning rate scheduling with cosine annealing, train with early stopping based on validation loss, and visualize feature maps from the first convolutional layer.
Real-World Task
Fine-tune a pre-trained ResNet-50 for malware family classification: load the model with pre-trained ImageNet weights, replace the final layer for 10 malware families, implement custom data augmentation for binary executable visualizations, use mixed-precision training with torch.cuda.amp, export to TorchScript, and benchmark inference speed on CPU vs GPU.
FAQ
Try It Yourself
import torch
import torch.nn as nn
# Create a simple neural network
model = nn.Sequential(
nn.Linear(10, 20),
nn.ReLU(),
nn.Linear(20, 2),
nn.Softmax(dim=1)
)
# Generate dummy data
X = torch.randn(100, 10)
y = torch.randint(0, 2, (100,))
# Forward pass
output = model(X)
print(f"Input shape: {X.shape}, Output shape: {output.shape}")
# Check if CUDA is available
print(f"CUDA available: {torch.cuda.is_available()}")What’s Next
Related topics: Python, TensorFlow, Keras, Deep Learning
What’s Next
Congratulations on completing this PyTorch tutorial! Here’s where to go from here:
- Practice daily — Consistency is more important than long study sessions
- Build a project — Apply what you learned by building something real
- Explore related topics — Check out other tutorials in the same category
- Join the community — Discuss with other learners and share your progress
Remember: every expert was once a beginner. Keep coding!
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro