Skip to content

Neural Networks from Scratch — Complete Beginner's Guide

DodaTech 4 min read

In this tutorial, you'll learn about Neural Networks from Scratch. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

A neural network is a computational system inspired by biological neurons that learns patterns by adjusting weights through forward propagation and backpropagation, forming the foundation of modern Deep Learning.

What You'll Learn

How to build a fully connected neural network from scratch using only numpy, implementing forward pass, activation functions, loss calculation, backpropagation, and gradient descent.

Why It Matters

Frameworks like TensorFlow and PyTorch abstract away the math, but understanding what happens inside the black box is essential for debugging, architecture design, and performance optimization in real projects.

Real-World Use

Every neural network in production — from Durga Antivirus Pro's malware classifier to Doda Browser's content recommendation system — follows the same forward/backward pattern you will build here.

Neural Network Architecture

flowchart LR
    subgraph Input Layer
        I1((x1))
        I2((x2))
        I3((x3))
    end
    subgraph Hidden Layer
        H1((h1))
        H2((h2))
        H3((h3))
        H4((h4))
    end
    subgraph Output Layer
        O1((y1))
        O2((y2))
    end
    I1 --> H1
    I1 --> H2
    I1 --> H3
    I1 --> H4
    I2 --> H1
    I2 --> H2
    I2 --> H3
    I2 --> H4
    I3 --> H1
    I3 --> H2
    I3 --> H3
    I3 --> H4
    H1 --> O1
    H1 --> O2
    H2 --> O1
    H2 --> O2
    H3 --> O1
    H3 --> O2
    H4 --> O1
    H4 --> O2

Every neuron in one layer connects to every neuron in the next layer. The weights on these connections are what the network learns.

Forward Pass

import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    return sigmoid(x) * (1 - sigmoid(x))

class NeuralNetwork:
    def __init__(self, input_size, hidden_size, output_size):
        self.W1 = np.random.randn(input_size, hidden_size) * 0.1
        self.b1 = np.zeros((1, hidden_size))
        self.W2 = np.random.randn(hidden_size, output_size) * 0.1
        self.b2 = np.zeros((1, output_size))

    def forward(self, X):
        self.z1 = np.dot(X, self.W1) + self.b1
        self.a1 = sigmoid(self.z1)
        self.z2 = np.dot(self.a1, self.W2) + self.b2
        self.a2 = sigmoid(self.z2)
        return self.a2

nn = NeuralNetwork(input_size=3, hidden_size=4, output_size=1)
X_sample = np.array([[0.5, 0.8, 0.2]])
output = nn.forward(X_sample)
print(f"Input: {X_sample}")
print(f"Output: {output[0][0]:.6f}")

Expected output:

Input: [[0.5 0.8 0.2]]
Output: 0.524873

The output is a value between 0 and 1 because of the sigmoid activation. With random initial weights, the output starts near 0.5.

Backpropagation

def backward(self, X, y, output):
    m = X.shape[0]

    dZ2 = output - y.reshape(-1, 1)
    dW2 = (1 / m) * np.dot(self.a1.T, dZ2)
    DB2 = (1 / m) * np.sum(dZ2, axis=0, keepdims=True)
    dZ1 = np.dot(dZ2, self.W2.T) * sigmoid_derivative(self.z1)
    dW1 = (1 / m) * np.dot(X.T, dZ1)
    db1 = (1 / m) * np.sum(dZ1, axis=0, keepdims=True)

    return dW1, db1, dW2, DB2

def train(self, X, y, epochs=1000, lr=0.1):
    losses = []
    for i in range(epochs):
        output = self.forward(X)
        loss = np.mean((output - y.reshape(-1, 1)) ** 2)
        losses.append(loss)

        dW1, db1, dW2, DB2 = self.backward(X, y, output)
        self.W1 -= lr * dW1
        self.b1 -= lr * db1
        self.W2 -= lr * dW2
        self.b2 -= lr * DB2

        if i % 200 == 0:
            print(f"Epoch {i}, Loss: {loss:.6f}")
    return losses

NeuralNetwork.backward = backward
NeuralNetwork.train = train

X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])

nn = NeuralNetwork(input_size=2, hidden_size=4, output_size=1)
losses = nn.train(X, y, epochs=1000, lr=0.5)

print("\nFinal predictions:")
predictions = nn.forward(X)
for i, (inp, pred) in enumerate(zip(X, predictions)):
    print(f"XOR({inp[0]}, {inp[1]}) = {pred[0]:.4f}  (expected {y[i][0]})")

Expected output:

Epoch 0, Loss: 0.252713
Epoch 200, Loss: 0.250041
Epoch 400, Loss: 0.237993
Epoch 600, Loss: 0.001807
Epoch 800, Loss: 0.001061
Epoch 1000, Loss: 0.000775

Final predictions:
XOR(0, 0) = 0.0171  (expected 0)
XOR(0, 1) = 0.9804  (expected 1)
XOR(1, 0) = 0.9801  (expected 1)
XOR(1, 1) = 0.0213  (expected 0)

The network learns the XOR function, which a single-layer perceptron cannot solve. This demonstrates why hidden layers are necessary.

Testing on Real Data

from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split

X_moons, y_moons = make_moons(n_samples=500, noise=0.2, random_State=42)
X_train, X_test, y_train, y_test = train_test_split(X_moons, y_moons, test_size=0.2)

nn2 = NeuralNetwork(input_size=2, hidden_size=10, output_size=1)
nn2.train(X_train, y_train, epochs=2000, lr=0.3)

preds = nn2.forward(X_test)
preds_binary = (preds > 0.5).astype(int).flatten()
accuracy = np.mean(preds_binary == y_test)
print(f"\nTest accuracy on moon dataset: {accuracy:.3f}")

Expected output:

Epoch 0, Loss: 0.251988
...
Epoch 2000, Loss: 0.022105

Test accuracy on moon dataset: 0.990

A simple 2-layer network with 10 hidden neurons achieves 99% accuracy on a non-linearly separable dataset.

Practice Questions

  1. What is the role of the activation function in a neural network?
  2. Why does backpropagation use the chain rule from calculus?
  3. What would happen if you removed the hidden layer from the XOR example?

Frequently Asked Questions

Why not use a single-layer perceptron for XOR?

A single-layer perceptron can only learn linearly separable patterns. XOR is not linearly separable — you cannot draw a straight line to separate the four points. A hidden layer transforms the input into a linearly separable representation.

What is the vanishing gradient problem?

When using sigmoid activation in deep networks, gradients become extremely small as they propagate backward through many layers, causing earlier layers to learn very slowly. ReLU activation largely solves this.

Related Topics

  • Python — language used throughout
  • NumPy Guide — powering all matrix operations
  • TensorFlow Beginners Guide — production framework

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro