Learn Artificial: Deep Learning Explained — Neural Networks for Beginners

Deep Learning Explained — Neural Networks for Beginners

DodaTech Updated Jun 6, 2026 8 min read

Deep Learning is a subset of machine learning that uses multi-layered neural networks to learn complex patterns — powering facial recognition, language translation, and self-driving cars.

What You’ll Learn

In this tutorial, you’ll learn how neural networks work — from a single perceptron to deep architectures — with practical Python code using Keras and TensorFlow.

Why It Matters

Deep learning drives the most impressive AI achievements: ChatGPT understands language, Tesla cars see the road, and Google Photos recognizes your face. Understanding the basics helps you grasp what’s possible.

Real-World Use

When you unlock your phone with your face, a deep neural network processes the camera image through millions of calculations in milliseconds. It identifies facial landmarks — distance between eyes, nose shape, jawline — and matches them against your stored face data.

    flowchart LR
  A[Input Layer] --> B[Hidden Layer 1]
  B --> C[Hidden Layer 2]
  C --> D[Hidden Layer 3]
  D --> E[Output Layer]
  B -- "Weights & Biases" --> F[Activation: ReLU]
  C -- "Weights & Biases" --> G[Activation: ReLU]
  D -- "Weights & Biases" --> H[Activation: Softmax]

What Is a Neural Network?

Let’s start with the smallest building block: the perceptron.

The Perceptron — A Single Neuron

A perceptron is like a tiny decision-maker. It takes several inputs, weighs their importance, and decides whether to “fire” or not.

Think of it like deciding whether to go for a walk:

Is it sunny? (+5 points)
Do I have free time? (+8 points)
Is it raining? (-10 points)

If the total score is high enough, you go for a walk. That’s exactly what a perceptron does.

import numpy as np

def perceptron(inputs, weights, bias):
    # Multiply each input by its weight, sum them up, add bias
    total = np.dot(inputs, weights) + bias
    # Step activation: if total > 0, fire (return 1), else stay off (return 0)
    return 1 if total > 0 else 0

# Example: should we watch a movie?
# Inputs: [is_weekend, have_time, is_interesting]
inputs = np.array([1, 1, 0])  # Weekend, have time, but not interesting
weights = np.array([3, 4, 5])  # How important each factor is
bias = -2  # Threshold adjustment

decision = perceptron(inputs, weights, bias)
print(f"Watch movie? {'Yes' if decision else 'No'}")

Expected output:

Watch movie? Yes

What’s happening? The perceptron calculates (1×3 + 1×4 + 0×5) + (-2) = 5. Since 5 > 0, it says yes. But if the movie isn’t interesting (0×5 = 0), the sum becomes (1×3 + 1×4) + (-2) = 5 — still enough to say yes, because weekend and free time outweigh the boring movie factor.

Activation Functions

A single perceptron with a step function is too simple. Real neural networks use activation functions that allow more nuanced output.

import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def relu(x):
    return max(0, x)

def tanh(x):
    return np.tanh(x)

values = np.array([-2, -1, 0, 1, 2])
print("Sigmoid:", np.round([sigmoid(v) for v in values], 3))
print("ReLU:", [relu(v) for v in values])
print("Tanh:", np.round([tanh(v) for v in values], 3))

Expected output:

Sigmoid: [0.119 0.269 0.5   0.731 0.881]
ReLU: [0, 0, 0, 1, 2]
Tanh: [-0.964 -0.762  0.     0.762  0.964]

Why activation functions matter: Without them, neural networks would just be linear transformations regardless of depth. Activation functions introduce non-linearity, allowing networks to learn complex patterns.

Building a Neural Network

A neural network is multiple perceptrons organized in layers:

Input layer — receives the raw data
Hidden layers — learn patterns and features
Output layer — produces the final prediction

How Backpropagation Works (Conceptually)

Backpropagation is how neural networks learn. Here’s the idea without the math overload:

Forward pass — data flows through the network and produces a prediction
Calculate error — compare the prediction with the correct answer
Backward pass — the error flows backward through the network, adjusting each weight to reduce the error
Repeat — do this thousands of times until the network is accurate

Think of it like adjusting a shower temperature. You turn the handle (forward pass), feel the water (calculate error), turn the handle the opposite way (backward pass), and repeat until it’s perfect.

Deep Learning with Keras

Let’s build a real neural network that classifies handwritten digits (the MNIST dataset):

import tensorflow as tf
from tensorflow import keras

# Load the dataset of handwritten digits
mnist = keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Normalize pixel values from 0-255 to 0-1
x_train, x_test = x_train / 255.0, x_test / 255.0

# Build the neural network
model = keras.Sequential([
    # Flatten 28x28 images into 1D array of 784 pixels
    keras.layers.Flatten(input_shape=(28, 28)),
    # Hidden layer with 128 neurons and ReLU activation
    keras.layers.Dense(128, activation='relu'),
    # Dropout prevents overfitting by randomly turning off 20% of neurons
    keras.layers.Dropout(0.2),
    # Output layer with 10 neurons (one per digit 0-9) and softmax
    keras.layers.Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train the model
model.fit(x_train, y_train, epochs=5)

# Evaluate on test data
test_loss, test_acc = model.evaluate(x_test, y_test, verbose=0)
print(f"Test accuracy: {test_acc:.4f}")

Expected output (approximate):

Epoch 1/5 -> accuracy: ~0.90
Epoch 2/5 -> accuracy: ~0.95
Epoch 3/5 -> accuracy: ~0.96
Epoch 4/5 -> accuracy: ~0.97
Epoch 5/5 -> accuracy: ~0.98
Test accuracy: ~0.9760

Line-by-line explanation:

Flatten — each image is 28×28 pixels. We flatten it to 784 values so the dense layer can process it
Dense(128, activation='relu') — 128 neurons that detect patterns. ReLU makes negative values zero
Dropout(0.2) — during training, 20% of neurons are randomly ignored. This forces the network to learn redundant patterns and prevents memorization
Dense(10, activation='softmax') — 10 outputs (digits 0-9). Softmax converts them into probabilities that sum to 1

Security Applications of Deep Learning

Malware classification — Deep Learning models analyze raw bytes of files to classify them as malicious or benign without needing explicit feature engineering.

Network intrusion detection — Deep neural networks process network packet sequences to detect attack patterns that traditional rules would miss.

Adversarial detection — Specialized networks detect when someone tries to fool an AI system with carefully crafted inputs.

Common Mistakes Beginners Make

1. Using too many layers

More layers aren’t always better. Start simple and add complexity only when needed.

2. Not normalizing input data

Neural networks work best when input values are between 0 and 1 or standardized around 0.

3. Ignoring the learning rate

Too high = the model never converges. Too low = training takes forever. Start with 0.001 (Adam optimizer default).

4. Training with too few epochs

The model needs enough passes through the data to learn. Monitor the loss curve — stop when it plateaus.

5. Not using dropout or regularization

Without dropout, your network will memorize the training data and fail on new examples.

Practice Questions

What are the three main components of a neural network? Input layer, hidden layers, and output layer. Each layer contains neurons connected by weighted edges.
What does an activation function do? It introduces non-linearity, allowing the network to learn complex patterns. Without it, deep networks would be equivalent to shallow ones.
What is backpropagation in simple terms? It’s the learning process: calculate the error, then adjust each weight proportionally to its contribution to the error.
Why is ReLU popular? It’s simple (max(0, x)), computationally efficient, and solves the vanishing gradient problem that plagued sigmoid.
What problem does dropout solve? Overfitting. By randomly disabling neurons during training, the network learns redundant patterns that generalize better.

Challenge

Modify the MNIST example above. Try adding another hidden layer (e.g., 64 neurons). Does accuracy improve? What about changing the activation function to tanh?

Real-World Task

Collect 20 images of your handwriting (digits 0-9, two each). Resize them to 28×28 pixels. Use the trained model to predict them. How many does it get right?

FAQ

Do I need a GPU for deep learning?

For simple examples like MNIST, a CPU works fine. For large datasets and complex models, a GPU speeds training by 10-50x.

What’s the difference between AI, ML, and deep learning?

AI is the broad field. ML is a subset where machines learn from data. Deep Learning is a subset of ML using multi-layered neural networks. Think: AI > ML > Deep Learning.

How many layers does a “deep” network need?

There’s no strict rule, but generally 3+ hidden layers qualifies as “deep.” Some modern networks have hundreds of layers.

Is TensorFlow free?

Yes, TensorFlow and Keras are open-source and free to use.

What’s the best framework for beginners?

Keras (part of TensorFlow) is the most beginner-friendly. It’s high-level, readable, and powerful enough for production use.

Try It Yourself

▶ Try It Yourself Edit the code and click Run

Mini Project: Digit Recognizer

Take a photo of a handwritten digit, convert it to 28×28 grayscale, and pass it through the MNIST-trained network. This is the same technology used by postal services to read zip codes on envelopes.

Security angle: Similar architectures are used in Durga Antivirus Pro to analyze file headers and detect malicious patterns.