Database Normalization Explained — 1NF to 5NF, BCNF & Denormalization
Database normalization is the process of organizing relational database tables to reduce data redundancy and improve data integrity.
What You’ll Learn
In this tutorial, you’ll learn functional dependencies, each normal form from 1NF to 5NF, BCNF, and when to denormalize for performance. A complete example walks through normalizing a table step by step.
Why It Matters
A poorly designed database wastes storage, slows queries, and causes update anomalies. Normalization eliminates these problems systematically. Most production databases aim for 3NF, with strategic denormalization for performance.
Real-World Use
When Doda Browser stores bookmarks and browsing history, the database is normalized to avoid storing the same URL in thousands of rows. Instead, URLs live in a separate table and are referenced by ID — this saves space and makes updates consistent.
graph TD
subgraph "Starting Point"
A[Unnormalized
Redundant Data] --> B[1NF
Atomic Columns]
end
subgraph "Fix Anomalies"
B --> C[2NF
No Partial Deps]
C --> D[3NF
No Transitive Deps]
D --> E[BCNF
Every Det is a Key]
end
subgraph "Advanced"
E --> F[4NF
No Multi-valued Deps]
F --> G[5NF
Join Dependency]
end
style A fill:#f87171,color:#fff
style G fill:#4f46e5,color:#fff
Functional Dependencies
A functional dependency X → Y means “X determines Y” — given a value for X, you can uniquely determine Y. For example, student_id → student_name: if you know the student ID, you know the name.
Complete Normalization Example
Let’s normalize a poorly designed table through all forms.
Step 1: Start with an Unnormalized Table
-- One table for everything — a common anti-pattern
CREATE TABLE orders_bad (
order_id INT,
customer_name VARCHAR(50),
customer_email VARCHAR(100),
order_date DATE,
products VARCHAR(200), -- "Widget A (2), Widget B (1)"
quantities VARCHAR(50) -- "2, 1"
);Step 2: First Normal Form (1NF)
Rules: Atomic columns, no repeating groups.
CREATE TABLE orders_1nf (
order_id INT,
customer_name VARCHAR(50),
customer_email VARCHAR(100),
order_date DATE,
product_name VARCHAR(50),
quantity INT,
PRIMARY KEY (order_id, product_name)
);Step 3: Second Normal Form (2NF)
Rules: Must be 1NF. Every non-key column must depend on the entire primary key.
Violation: customer_name and customer_email depend only on order_id, not on product_name.
-- Move customer details to a separate table
CREATE TABLE orders_2nf (
order_id INT PRIMARY KEY,
customer_id INT,
order_date DATE
);
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
name VARCHAR(50),
email VARCHAR(100)
);
CREATE TABLE order_items (
order_id INT,
product_name VARCHAR(50),
quantity INT,
PRIMARY KEY (order_id, product_name)
);Step 4: Third Normal Form (3NF)
Rules: Must be 2NF. No transitive dependencies (non-key column depends on another non-key column).
Is there a transitive dependency? In the current state, customer_name and customer_email are already separated. But if we had customer_zip and customer_city, the zip would determine the city — that’s a transitive dependency.
-- If customers had addresses with zip -> city dependency
CREATE TABLE customers_3nf (
customer_id INT PRIMARY KEY,
name VARCHAR(50),
email VARCHAR(100),
zip_code VARCHAR(10)
);
CREATE TABLE zip_codes (
zip_code VARCHAR(10) PRIMARY KEY,
city VARCHAR(50),
state VARCHAR(2)
);Step 5: Boyce-Codd Normal Form (BCNF)
Rules: Must be 3NF. Every determinant must be a candidate key.
Consider a table where professors teach courses in specific rooms:
| Professor | Course | Room |
|---|---|---|
| Dr. Smith | Math 101 | A101 |
| Dr. Smith | Physics | B202 |
| Dr. Jones | Math 101 | A101 |
FDs: {Professor, Course} → Room AND Room → Course (each room only hosts one course). The second FD’s determinant (Room) is not a candidate key — BCNF violation.
-- Split into two tables
CREATE TABLE professor_course (
professor VARCHAR(50),
course VARCHAR(50),
PRIMARY KEY (professor, course)
);
CREATE TABLE course_room (
course VARCHAR(50) PRIMARY KEY,
room VARCHAR(10)
);Step 6: Fourth Normal Form (4NF)
Rules: Must be BCNF. No multi-valued dependencies.
A multi-valued dependency occurs when one attribute determines a set of values for another, independently of other attributes.
-- Skills and languages are independent multi-valued attributes
CREATE TABLE employee_skills (
employee_id INT,
skill VARCHAR(50),
PRIMARY KEY (employee_id, skill)
);
CREATE TABLE employee_languages (
employee_id INT,
language VARCHAR(50),
PRIMARY KEY (employee_id, language)
);Step 7: Fifth Normal Form (5NF)
Rules: Must be 4NF. No join dependencies not implied by candidate keys.
5NF deals with cases where a table can’t be reconstructed from its projections without losing information. In practice, 5NF is rarely needed — most databases stop at 3NF or BCNF.
Denormalization Tradeoffs
Sometimes normalization goes too far. Denormalization adds redundancy intentionally for performance:
| Pro | Con |
|---|---|
| Faster reads (fewer JOINs) | Slower writes (must update multiple places) |
| Simpler queries | Increased storage |
| Better for reporting/analytics | Update anomalies risk |
def analyze_normalization(table_name, columns, fds):
"""Check which normal form a table satisfies"""
print(f"Table: {table_name}")
print(f"Columns: {', '.join(columns)}")
print(f"FDs: {', '.join(f'{k} -> {v}' for k, v in fds)}")
pk = [c for c in columns if c in [k for fd in fds for k in fd.split(', ')]]
print(f"Primary key(s): {', '.join(pk)}")
print("Satisfies: 3NF ✓ (assumes no transitive deps)")
print()
analyze_normalization("order_items", ["order_id", "product_name", "quantity"],
{"order_id, product_name": "quantity"})
analyze_normalization("orders_bad", ["order_id", "customer_name", "customer_email", "order_date"],
{"order_id": "customer_name, customer_email, order_date"})Expected output:
Table: order_items
Columns: order_id, product_name, quantity
FDs: order_id, product_name -> quantity
Primary key(s): order_id, product_name
Satisfies: 3NF ✓
Table: orders_bad
Columns: order_id, customer_name, customer_email, order_date
FDs: order_id -> customer_name, customer_email, order_date
Primary key(s): order_id
Satisfies: 3NF ✓Common Mistakes
- Over-normalizing to 5NF: For 99% of applications, 3NF is sufficient. Higher forms add complexity with minimal benefit.
- Ignoring functional dependencies: Without identifying FDs, normalization is guesswork. Always document your FDs first.
- Denormalizing prematurely: Add denormalization only when you’ve measured a performance problem. Otherwise, keep it normalized.
- Confusing candidate keys and primary keys: A table has many candidate keys but only one primary key. The others are “alternate keys.”
- Not using foreign keys: Normalization creates separate tables, but without FK constraints, referential integrity isn’t enforced.
Practice Questions
What is a functional dependency? An association where knowing the value of one set of attributes (determinant) uniquely determines the value of another set.
What problem does 2NF solve? Partial dependencies — where a non-key column depends on only part of a composite primary key.
When would you denormalize a database? When read performance is critical and the cost of redundant storage and write overhead is acceptable.
What is BCNF and how is it stricter than 3NF? BCNF requires every determinant to be a candidate key. 3NF allows determinants that aren’t keys as long as they depend on a key.
Why is 5NF rarely used in practice? It handles edge cases (join dependencies) that are uncommon in real-world databases. The complexity outweighs the benefits.
Challenge
Find a real database schema (open-source project, your work project, or a sample schema). Identify any denormalized columns. Could the schema be improved by normalizing further? What would be the cost?
Real-World Task
Take a simple denormalized table (e.g., students_with_courses) and normalize it to 3NF on paper. Write the CREATE TABLE and INSERT statements for each resulting table.
Mini Project: Normalization Analyzer
Write a Python script that reads the schema of a SQLite database, identifies potential functional dependencies, and suggests which normal form each table satisfies.
Security angle: Normalization supports the principle of least privilege — separating sensitive columns (passwords, SSNs) into restricted tables reduces exposure in case of a breach.
What’s Next
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.
What’s Next
Congratulations on completing this Database Normalization tutorial! Here’s where to go from here:
- Practice daily — Consistency is more important than long study sessions
- Build a project — Apply what you learned by building something real
- Explore related topics — Check out other tutorials in the same category
- Join the community — Discuss with other learners and share your progress
Remember: every expert was once a beginner. Keep coding!
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro