Learn Databases: Database Design Explained — A Beginner's Guide

Database Design Explained — A Beginner's Guide

DodaTech Updated Jun 6, 2026 14 min read

Database design is the process of structuring data to reduce redundancy, ensure integrity, and enable efficient querying, typically through normalization and careful relationship modeling.

What You’ll Learn

By the end of this tutorial, you’ll understand the three normal forms (1NF, 2NF, 3NF), how to draw entity-relationship diagrams, design tables with proper relationships, and avoid common database design pitfalls.

Why Database Design Matters

A poorly designed database leads to data duplication, update anomalies, slow queries, and confusing code. A well-designed database is the foundation of every reliable application. At DodaTech, DodaZIP’s file management and Durga Antivirus Pro’s threat database both rely on solid database design to process millions of records efficiently.

Database Design Learning Path

    flowchart LR
  A[SQL Basics] --> B[MySQL]
  B --> C[PostgreSQL]
  C --> D[MongoDB]
  D --> E[Redis]
  E --> F[Database Design]
  F --> G{You Are Here}
  style G fill:#f90,color:#fff

Prerequisites: SQL basics and familiarity with at least one database (MySQL, PostgreSQL). Understanding of tables, rows, and columns is assumed.

What Is Database Design? (The “Why” First)

Think of database design like architectural plans for a building. You wouldn’t build a house without blueprints — walls in wrong places, doors that don’t fit, pipes that cross electrical lines. Database design is the blueprint for your data.

A good design ensures:

No duplicate data (you don’t store a customer’s address in three places)
Data integrity (you can’t create an order for a non-existent customer)
Efficient queries (finding what you need is fast)
Easy changes (adding new features doesn’t break existing data)

Entity-Relationship (ER) Diagrams

An ER diagram visually represents the entities (tables) in your database and how they relate. Think of it as a map of your data.

Example ER Diagram — E-Commerce Database

    erDiagram
    CUSTOMER ||--o{ ORDER : places
    CUSTOMER {
        int customer_id PK
        string first_name
        string last_name
        string email
        string phone
    }
    ORDER ||--|{ ORDER_ITEM : contains
    ORDER {
        int order_id PK
        int customer_id FK
        date order_date
        decimal total
    }
    ORDER_ITEM ||--|| PRODUCT : references
    ORDER_ITEM {
        int order_item_id PK
        int order_id FK
        int product_id FK
        int quantity
        decimal unit_price
    }
    PRODUCT {
        int product_id PK
        string name
        decimal price
        int stock
    }

Symbols Explained

||--o{: One (customer) to zero-or-many (orders)
||--|{: One (order) to one-or-many (order items)
||--||: Exactly one (order item) to exactly one (product)

Reading the Diagram

A Customer can place zero or more Orders
An Order belongs to exactly one Customer
An Order contains one or more Order Items
An Order Item references exactly one Product
A Product can appear in many Order Items

Key Relationships

Relationship	Notation	Example
One-to-one	`
One-to-many	`
Many-to-many	`}o--o{`	Student ↔ Courses

Primary and Foreign Keys

Primary Keys

A primary key uniquely identifies each row in a table. Every table should have one.

CREATE TABLE customers (
    customer_id INT PRIMARY KEY,  -- Explicit primary key
    email VARCHAR(255) UNIQUE     -- Also unique, but not the PK
);

-- Better: auto-incrementing
CREATE TABLE customers (
    customer_id INT AUTO_INCREMENT PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL
);

Foreign Keys

A foreign key links rows between tables, maintaining referential integrity:

CREATE TABLE orders (
    order_id INT AUTO_INCREMENT PRIMARY KEY,
    customer_id INT NOT NULL,
    order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (customer_id)
        REFERENCES customers(customer_id)
        ON DELETE RESTRICT
        ON UPDATE CASCADE
);

Without foreign keys, you could create an order with customer_id = 999 even if no customer with that ID exists. Foreign keys prevent this.

Normalization — Eliminating Data Redundancy

Normalization is the process of organizing data to reduce duplication. Think of it as organizing a messy closet — you group similar items together and make sure nothing is in two places.

Unnormalized Data (The Problem)

Imagine tracking orders in a single table:

order_id | customer     | address             | product1   | price1 | product2   | price2
1        | Alice        | 123 Main St, NY     | Mouse      | 25.00  | Keyboard   | 80.00
2        | Bob          | 456 Oak Ave, LA     | Monitor    | 200.00 | NULL       | NULL
3        | Alice        | 123 Main St, NY     | USB Hub    | 40.00  | Mouse      | 25.00

Problems:

Alice’s address is stored twice (inconsistency risk if she moves)
Limited number of products per order (product1, product2, productN)
Hard to query “how many mice were sold?”

First Normal Form (1NF)

Rule: Each column contains atomic (indivisible) values. No repeating groups.

-- Break the single table into orders and order_items

-- Orders table (1NF)
-- order_id | customer_id | order_date
-- 1        | 1           | 2026-06-06

-- Order_items table (1NF)
-- order_item_id | order_id | product_name | quantity | unit_price
-- 1             | 1        | Mouse        | 1        | 25.00
-- 2             | 1        | Keyboard     | 1        | 80.00

Now each cell has exactly one value, and there are no repeating product columns.

Second Normal Form (2NF)

Rule: Must be in 1NF AND every non-key column must depend on the entire primary key (not just part of it).

-- If we had this table (composite primary key: order_id + product_id):
CREATE TABLE order_details (
    order_id INT,
    product_id INT,
    product_name VARCHAR(100),  -- Depends only on product_id, not order_id
    quantity INT,
    unit_price DECIMAL(10,2),
    customer_name VARCHAR(100),  -- Depends only on order_id, not product_id
    PRIMARY KEY (order_id, product_id)
);

Problem: product_name depends only on product_id, and customer_name depends only on order_id. These are partial dependencies.

Fix: Split into three tables:

-- Orders (depends on order_id)
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_name VARCHAR(100)
);

-- Products (depends on product_id)
CREATE TABLE products (
    product_id INT PRIMARY KEY,
    product_name VARCHAR(100)
);

-- Order_Items (depends on order_id + product_id)
CREATE TABLE order_items (
    order_id INT,
    product_id INT,
    quantity INT,
    unit_price DECIMAL(10,2),
    PRIMARY KEY (order_id, product_id),
    FOREIGN KEY (order_id) REFERENCES orders(order_id),
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);

Third Normal Form (3NF)

Rule: Must be in 2NF AND no non-key column should depend on another non-key column (no transitive dependencies).

-- Violates 3NF:
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    customer_name VARCHAR(100),  -- Depends on customer_id, not order_id
    customer_address VARCHAR(200)  -- Also depends on customer_id
);

Problem: If a customer changes their address, we’d need to update every order they’ve ever made.

Fix: Split into separate tables:

CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    name VARCHAR(100),
    address VARCHAR(200)
);

CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    order_date TIMESTAMP,
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

Now customer details are in one place. Change it once, and every order reflects the update.

Normalization Summary

Normal Form	Rule	Why It Matters
1NF	No repeated groups, atomic values	Eliminates multi-valued columns
2NF	No partial dependencies	Each column depends on the full PK
3NF	No transitive dependencies	Each column depends only on the PK
BCNF	Stronger version of 3NF	Every determinant is a candidate key

When to Denormalize

Normalization isn’t always the answer. Denormalization (intentionally duplicating data) is useful for:

Read performance: Fewer joins = faster queries
Reporting: Pre-computed summaries
Caching: Duplicated data in a cache layer (Redis)

Trade-off: Denormalization improves read speed but complicates writes (you must update multiple places).

Practical Design Exercise — Library Management System

Let’s design a library system from scratch:

Step 1: Identify Entities

Books: What the library lends
Authors: Who wrote the books
Members: Who borrows books
Loans: Records of borrowed books

Step 2: Define Relationships

    erDiagram
    BOOK ||--o{ BOOK_AUTHOR : written_by
    AUTHOR ||--o{ BOOK_AUTHOR : writes
    BOOK ||--o{ LOAN : borrowed
    MEMBER ||--o{ LOAN : borrows
    BOOK {
        int book_id PK
        string isbn
        string title
        int publish_year
        int copies
    }
    AUTHOR {
        int author_id PK
        string first_name
        string last_name
    }
    BOOK_AUTHOR {
        int book_id FK
        int author_id FK
    }
    MEMBER {
        int member_id PK
        string first_name
        string last_name
        string email
        date join_date
    }
    LOAN {
        int loan_id PK
        int book_id FK
        int member_id FK
        date borrow_date
        date due_date
        date return_date
    }

Step 3: Create the Schema

-- Library database schema
CREATE TABLE books (
    book_id INT AUTO_INCREMENT PRIMARY KEY,
    isbn VARCHAR(13) UNIQUE NOT NULL,
    title VARCHAR(255) NOT NULL,
    publish_year INT,
    copies INT DEFAULT 1,
    INDEX idx_books_title (title)
);

CREATE TABLE authors (
    author_id INT AUTO_INCREMENT PRIMARY KEY,
    first_name VARCHAR(100) NOT NULL,
    last_name VARCHAR(100) NOT NULL,
    INDEX idx_authors_name (last_name, first_name)
);

-- Many-to-many relationship between books and authors
CREATE TABLE book_authors (
    book_id INT NOT NULL,
    author_id INT NOT NULL,
    PRIMARY KEY (book_id, author_id),
    FOREIGN KEY (book_id) REFERENCES books(book_id) ON DELETE CASCADE,
    FOREIGN KEY (author_id) REFERENCES authors(author_id) ON DELETE CASCADE
);

CREATE TABLE members (
    member_id INT AUTO_INCREMENT PRIMARY KEY,
    first_name VARCHAR(100) NOT NULL,
    last_name VARCHAR(100) NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    join_date DATE DEFAULT CURRENT_DATE
);

CREATE TABLE loans (
    loan_id INT AUTO_INCREMENT PRIMARY KEY,
    book_id INT NOT NULL,
    member_id INT NOT NULL,
    borrow_date DATE NOT NULL,
    due_date DATE NOT NULL,
    return_date DATE,
    FOREIGN KEY (book_id) REFERENCES books(book_id) ON DELETE RESTRICT,
    FOREIGN KEY (member_id) REFERENCES members(member_id) ON DELETE RESTRICT,
    INDEX idx_loans_dates (borrow_date, due_date)
);

Step 4: Useful Queries

-- Find all books by a specific author
SELECT b.title, b.isbn
FROM books b
INNER JOIN book_authors ba ON b.book_id = ba.book_id
INNER JOIN authors a ON ba.author_id = a.author_id
WHERE a.last_name = 'Tolkien';

-- Find currently borrowed books (no return date)
SELECT b.title, m.first_name, m.last_name, l.borrow_date, l.due_date
FROM loans l
INNER JOIN books b ON l.book_id = b.book_id
INNER JOIN members m ON l.member_id = m.member_id
WHERE l.return_date IS NULL AND l.due_date < CURRENT_DATE;

-- Find overdue books
SELECT b.title, m.first_name, m.last_name, l.due_date,
       DATEDIFF(CURRENT_DATE, l.due_date) AS days_overdue
FROM loans l
INNER JOIN books b ON l.book_id = b.book_id
INNER JOIN members m ON l.member_id = m.member_id
WHERE l.return_date IS NULL AND l.due_date < CURRENT_DATE
ORDER BY days_overdue DESC;

Common Database Design Mistakes

1. Not Using a Primary Key

Every table needs a primary key. Without one, you can’t uniquely identify rows, and foreign key relationships become impossible.

2. Storing the Same Data in Multiple Places

If a customer’s address is in both orders and customers, you risk inconsistency when the address changes. Store it once, reference it with foreign keys.

3. Using Generic Column Names

-- BAD: Generic names
CREATE TABLE data (id INT, col1 VARCHAR(50), col2 INT, col3 DATE);

-- GOOD: Descriptive names
CREATE TABLE products (product_id INT, name VARCHAR(50), price INT, release_date DATE);

4. Not Planning for Growth

If you expect millions of records, design indexes from the start. Consider partitioning or sharding for very large tables.

5. Ignoring Indexes on Foreign Keys

MySQL doesn’t automatically index foreign keys (some databases do). Always add indexes for columns used in JOINs.

6. Over-Normalization

Breaking everything into tiny tables with excessive joins kills performance. Normalize to 3NF as a starting point, then denormalize for performance where needed.

7. Not Using Constraints

Use NOT NULL, UNIQUE, CHECK, and DEFAULT to enforce data integrity at the database level. Don’t rely only on application code.

Common Mistakes Beginners Make

1. Skipping the Fundamentals

Many beginners jump straight to advanced topics without mastering the basics. Take time to understand the core concepts before moving on.

2. Not Practicing Enough

Reading tutorials without writing code leads to shallow understanding. Code along with every example and experiment on your own.

3. Ignoring Error Messages

Error messages tell you exactly what went wrong. Read them carefully — they usually point to the line and type of issue.

4. Copy-Pasting Without Understanding

It’s tempting to copy code from tutorials, but typing it yourself and understanding each line builds real skill.

5. Giving Up Too Early

Every developer hits frustrating bugs. Take breaks, ask for help, and remember that struggling is part of learning.

Practice Questions

1. What’s the purpose of normalization?

To reduce data redundancy, eliminate update anomalies, and ensure data integrity by organizing data into related tables.

2. What’s the difference between 1NF, 2NF, and 3NF?

1NF: atomic values, no repeating groups. 2NF: 1NF + no partial dependencies on composite keys. 3NF: 2NF + no transitive dependencies (non-key columns don’t depend on other non-key columns).

3. What’s a foreign key and why is it important?

A foreign key links a column in one table to a primary key in another. It maintains referential integrity — preventing orphan records and ensuring relationships stay valid.

4. When would you denormalize a database?

For read-heavy applications, reporting, or when query performance with JOINs is unacceptable. Denormalization trades write complexity and storage for read speed.

5. Challenge: Design a schema for a blog with posts, comments, and tags.

CREATE TABLE posts (
    post_id INT AUTO_INCREMENT PRIMARY KEY,
    title VARCHAR(200) NOT NULL,
    content TEXT NOT NULL,
    author_id INT NOT NULL,
    published_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (author_id) REFERENCES authors(author_id)
);

CREATE TABLE comments (
    comment_id INT AUTO_INCREMENT PRIMARY KEY,
    post_id INT NOT NULL,
    author_name VARCHAR(100) NOT NULL,
    content TEXT NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (post_id) REFERENCES posts(post_id) ON DELETE CASCADE
);

CREATE TABLE tags (
    tag_id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(50) UNIQUE NOT NULL
);

CREATE TABLE post_tags (
    post_id INT NOT NULL,
    tag_id INT NOT NULL,
    PRIMARY KEY (post_id, tag_id),
    FOREIGN KEY (post_id) REFERENCES posts(post_id) ON DELETE CASCADE,
    FOREIGN KEY (tag_id) REFERENCES tags(tag_id) ON DELETE CASCADE
);

Real-World Task: Redesign a Poor Schema

Here’s a poorly designed table. Identify the issues and redesign it:

-- BAD DESIGN
CREATE TABLE orders (
    id INT PRIMARY KEY,
    customer_name VARCHAR(100),
    customer_email VARCHAR(100),
    customer_address VARCHAR(200),
    product_1 VARCHAR(100),
    product_1_price DECIMAL(10,2),
    product_2 VARCHAR(100),
    product_2_price DECIMAL(10,2),
    product_3 VARCHAR(100),
    product_3_price DECIMAL(10,2),
    order_date DATE
);

Issues:

Violates 1NF (repeating product columns)
Violates 2NF (customer info duplicated, depends on customer, not order)
Only supports 3 products per order maximum
Can’t easily query “total sales per product”

Fix:

CREATE TABLE customers (
    customer_id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    address VARCHAR(200)
);

CREATE TABLE orders (
    order_id INT AUTO_INCREMENT PRIMARY KEY,
    customer_id INT NOT NULL,
    order_date DATE NOT NULL,
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

CREATE TABLE products (
    product_id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    price DECIMAL(10,2) NOT NULL
);

CREATE TABLE order_items (
    order_id INT NOT NULL,
    product_id INT NOT NULL,
    quantity INT NOT NULL DEFAULT 1,
    PRIMARY KEY (order_id, product_id),
    FOREIGN KEY (order_id) REFERENCES orders(order_id),
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);

FAQ

What is an entity-relationship diagram?

An ER diagram visually represents database entities (tables) and their relationships. It’s the blueprint for your database schema, showing primary keys, foreign keys, and relationship types (one-to-one, one-to-many, many-to-many).

Do I always need to normalize to 3NF?

Start with 3NF as a baseline, then denormalize where performance testing shows it’s necessary. Most production databases are somewhere between 3NF and fully denormalized, depending on access patterns.

What’s a surrogate key vs natural key?

A surrogate key (auto-increment INT, UUID) has no business meaning — it’s just an identifier. A natural key (email, ISBN, SSN) has real-world meaning. Surrogate keys are preferred because they never change and are simpler to use.

How do I handle many-to-many relationships?

Always with a junction (associative) table. Example: students can enroll in many courses, courses have many students → create a enrollments table with student_id and course_id as a composite primary key.

What is indexing and when should I use it?

Indexes speed up data retrieval at the cost of slower writes. Index columns used in WHERE clauses, JOIN conditions, and ORDER BY. Don’t index every column — monitor slow queries with EXPLAIN and add indexes where needed.

Try It Yourself

Design a database for a task management app:

    erDiagram
    USER ||--o{ PROJECT : owns
    USER ||--o{ TASK : assigned
    PROJECT ||--o{ TASK : contains
    USER {
        int user_id PK
        string name
        string email
    }
    PROJECT {
        int project_id PK
        int owner_id FK
        string name
        date deadline
    }
    TASK {
        int task_id PK
        int project_id FK
        int assignee_id FK
        string title
        boolean completed
        date due_date
    }

CREATE TABLE users (
    user_id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL
);

CREATE TABLE projects (
    project_id INT AUTO_INCREMENT PRIMARY KEY,
    owner_id INT NOT NULL,
    name VARCHAR(200) NOT NULL,
    deadline DATE,
    FOREIGN KEY (owner_id) REFERENCES users(user_id)
);

CREATE TABLE tasks (
    task_id INT AUTO_INCREMENT PRIMARY KEY,
    project_id INT NOT NULL,
    assignee_id INT,
    title VARCHAR(200) NOT NULL,
    completed BOOLEAN DEFAULT FALSE,
    due_date DATE,
    FOREIGN KEY (project_id) REFERENCES projects(project_id) ON DELETE CASCADE,
    FOREIGN KEY (assignee_id) REFERENCES users(user_id) ON DELETE SET NULL
);

This schema demonstrates proper normalization — each piece of data is stored once, relationships use foreign keys, and the many-to-many between users and projects is handled through the projects table (a project has one owner, but a user can own many projects).

What’s Next

SQL Basics Guide

MySQL Explained

PostgreSQL Explained

What’s Next

Congratulations on completing this Database Design tutorial! Here’s where to go from here:

Practice daily — Consistency is more important than long study sessions
Build a project — Apply what you learned by building something real
Explore related topics — Check out other tutorials in the same category
Join the community — Discuss with other learners and share your progress

Remember: every expert was once a beginner. Keep coding!

Previous Redis Explained — Caching & Data Structures for Beginners Next Oracle Database Guide — SQL, PL/SQL, and Enterprise Features

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Databases