File Systems Explained — FAT, NTFS, ext4 & Inode Structure
File systems organize how data is stored, retrieved, and managed on disk — translating file names and paths into physical disk locations.
What You’ll Learn
In this tutorial, you’ll learn how FAT, NTFS, and ext4 file systems work, what inodes are, how directories are structured, and the trade-offs between different file allocation methods.
Why It Matters
Every file you save, open, or delete goes through the file system. Choosing the right file system affects performance, reliability, and maximum file size. Corruption or fragmentation can cost hours of recovery.
Real-World Use
When you save a 4 GB video file, the file system splits it into blocks, distributes them across the disk, and records where everything went. Tools like Durga Antivirus Pro rely on file system APIs to scan files for malware signatures across millions of files.
graph TD
subgraph "ext4 Inode"
A[Inode #] --> B[Permissions]
A --> C[Owner / Group]
A --> D[Size]
A --> E[Timestamps]
A --> F[Block Pointers]
F --> G[12 Direct Blocks]
F --> H[1 Indirect Block]
F --> I[1 Double Indirect]
F --> J[1 Triple Indirect]
end
G --> K[Data Block 0]
G --> L[Data Block 1]
H --> M[Block of Pointers]
M --> N[Data Block N]
File Allocation Methods
Contiguous Allocation
Each file occupies a contiguous block of disk sectors. Simple and fast for sequential reads — the disk head moves linearly. Problem: external fragmentation. Deleting a file leaves a gap that may not fit a new file.
Linked Allocation
Each block contains a pointer to the next block in the file. No fragmentation, but random access is slow (you must follow the chain). A broken link loses the rest of the file.
Indexed Allocation
Each file has an index block containing pointers to all data blocks. Used by most modern file systems. Supports fast random access but the index block adds overhead.
FAT (File Allocation Table)
FAT is a simple, widely compatible file system dating back to 1977. It uses a File Allocation Table — an array that maps each cluster on disk to its status (free, used, end-of-file, or bad).
class FATFileSystem:
def __init__(self, total_clusters=100):
self.fat = [0] * total_clusters # 0 = free
self.data = [None] * total_clusters
self.root = {}
def write_file(self, name, data, max_clusters=5):
clusters_needed = (len(data) + 511) // 512
if clusters_needed > max_clusters:
print(f"File too large for {max_clusters} clusters")
return
free_clusters = [i for i, v in enumerate(self.fat) if v == 0]
if len(free_clusters) < clusters_needed:
print("Disk full!")
return
cluster_list = free_clusters[:clusters_needed]
for i, c in enumerate(cluster_list):
self.fat[c] = -1 if i == len(cluster_list) - 1 else cluster_list[i + 1]
start = i * 512
self.data[c] = data[start:start + 512]
self.root[name] = cluster_list[0]
print(f"Written '{name}': {len(data)} bytes, start cluster {cluster_list[0]}")
def read_file(self, name):
if name not in self.root:
print(f"'{name}' not found")
return None
cluster = self.root[name]
data = b""
while cluster != -1:
data += self.data[cluster] if self.data[cluster] else b""
cluster = self.fat[cluster]
print(f"Read '{name}': {len(data.rstrip(chr(0).encode()))} bytes")
return data
fs = FATFileSystem()
fs.write_file("hello.txt", b"Hello, File System!")
fs.read_file("hello.txt")Expected output:
Written 'hello.txt': 19 bytes, start cluster 0
Read 'hello.txt': 19 bytesNTFS (New Technology File System)
NTFS is the default Windows file system since Windows NT 3.1. Key features:
- Master File Table (MFT) — a relational database storing every file and directory as a record
- Journaling — logs changes before applying them, enabling recovery after crashes
- Security — ACLs, encryption (EFS), compression
- Hard links, symlinks, junctions, sparse files
- Maximum volume size: 256 TB
- Maximum file size: 256 TB
NTFS MFT Record Structure
Each MFT record is 1 KB. Small files (under ~900 bytes) are stored resident — directly inside the MFT record. Larger files have pointers to clusters on disk.
ext4 (Fourth Extended File System)
ext4 is the default file system for most Linux distributions. It’s a journaling file system with several advanced features.
ext4 Inode Structure
Each file and directory has an inode (index node) containing metadata but NOT the file name:
| Field | Size | Purpose |
|---|---|---|
| Mode | 2 bytes | File type + permissions |
| Owner | 4 bytes | UID and GID |
| Size | 8 bytes | File size in bytes |
| Timestamps | 4×8 bytes | Access, modify, change, delete |
| Block count | 4 bytes | Total blocks used |
| Direct blocks | 12×4 bytes | Pointers to first 12 data blocks (48 KB) |
| Indirect block | 4 bytes | Points to block of pointers (4 MB with 4 KB blocks) |
| Double indirect | 4 bytes | Block of indirect blocks (4 GB) |
| Triple indirect | 4 bytes | Block of double indirect blocks (4 TB) |
class Ext4Inode:
BLOCK_SIZE = 4096
def __init__(self, size=0):
self.mode = 0o644
self.uid = 1000
self.gid = 1000
self.size = size
self.direct = [None] * 12
self.indirect = None
self.double_indirect = None
self.triple_indirect = None
def max_direct_size(self):
return 12 * self.BLOCK_SIZE
def can_store(self, size):
return size <= self.max_direct_size()
def describe(self):
print(f"Inode: mode={oct(self.mode)}, size={self.size}")
print(f"Direct blocks: {sum(1 for b in self.direct if b)} / 12 used")
max_fs = 12 + 1024 + 1024**2 + 1024**3
print(f"Max file size: ~{max_fs * self.BLOCK_SIZE / 1024**4:.1f} TB")
inode = Ext4Inode(size=32768)
inode.describe()
print(f"Stores in direct blocks: {inode.can_store(32768)}")
print(f"Max direct: {inode.max_direct_size()} bytes ({inode.max_direct_size()/1024} KB)")Expected output:
Inode: mode=0o644, size=32768
Direct blocks: 0 / 12 used
Max file size: ~4.0 TB
Stores in direct blocks: True
Max direct: 49152 bytes (48.0 KB)File System Comparison
| Feature | FAT32 | NTFS | ext4 |
|---|---|---|---|
| Max volume | 2 TB | 256 TB | 1 EB |
| Max file | 4 GB | 256 TB | 16 TB |
| Journaling | No | Yes | Yes |
| Compression | No | Yes | No |
| Encryption | No | EFS | eCryptfs |
| Snapshots | No | No | Yes |
| OS | Cross-platform | Windows | Linux |
Common Mistakes
- Formatting a large drive as FAT32: FAT32 can’t handle files over 4 GB. Use exFAT, NTFS, or ext4 for large media files.
- Confusing inodes with file names: The inode stores metadata, not the name. Directory entries map names to inode numbers. Multiple names can point to the same inode (hard links).
- Running out of inodes: A disk can have free space but zero free inodes. This happens with millions of tiny files.
- Unmounting without syncing: Write caches may hold data not yet on disk. Always unmount properly.
- Ignoring fragmentation on HDDs: While ext4 reduces fragmentation, heavily used HDDs still benefit from occasional defragmentation.
Practice Questions
What is an inode? A data structure storing file metadata (permissions, size, timestamps, block pointers) but not the file name.
Why does NTFS support larger files than FAT32? NTFS uses 64-bit cluster pointers and a more sophisticated B-tree structure (MFT) instead of a flat FAT table.
What is journaling? Recording pending changes in a log before applying them. If the system crashes, the journal replays or rolls back incomplete operations.
How does ext4 handle large files? Through indirect, double indirect, and triple indirect block pointers — a hierarchy of pointer blocks.
What is the difference between a hard link and a symbolic link? A hard link is a directory entry pointing to the same inode. A symlink is a special file containing a path to another file.
Challenge
Research Btrfs and ZFS. How do copy-on-write (CoW) file systems differ from traditional ones like ext4? What advantages do they offer for snapshots?
Real-World Task
Run df -i on Linux to check inode usage. Which file system on your machine has the highest inode utilization? Is it close to running out?
Mini Project: Inode Inspector
Write a Python script that uses os.stat() to read inode information for all files in a directory. Display file size, permissions, owner, timestamps, and inode number for each file.
Security angle: File system forensics tools examine inode timestamps to determine when files were accessed, modified, or changed — crucial for incident response and malware analysis.
What’s Next
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.
What’s Next
Congratulations on completing this File Systems tutorial! Here’s where to go from here:
- Practice daily — Consistency is more important than long study sessions
- Build a project — Apply what you learned by building something real
- Explore related topics — Check out other tutorials in the same category
- Join the community — Discuss with other learners and share your progress
Remember: every expert was once a beginner. Keep coding!
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro