Concurrency & Parallelism Explained — Threads, Mutexes & Deadlock Prevention
Concurrency is about dealing with many things at once; parallelism is about doing many things at once. Understanding this difference is critical for writing fast, correct software.
What You’ll Learn
In this tutorial, you’ll learn the difference between concurrency and parallelism, how threads and processes work, synchronization primitives like mutexes and semaphores, deadlock prevention strategies, and how modern languages handle concurrency.
Why It Matters
Modern CPUs have multiple cores. Programs that don’t use concurrency leave performance on the table. But concurrent code is hard to get right — race conditions and deadlocks can crash your application or corrupt data.
Real-World Use
Durga Antivirus Pro uses concurrent scan workers to check multiple files simultaneously. A database server handles thousands of concurrent queries using connection pooling and thread-safe transactions. Your web browser loads images, runs JavaScript, and handles user input concurrently.
graph TD
subgraph "Concurrency (logical)"
A1[Task A starts] --> B1[Task B starts]
B1 --> A2[Task A resumes]
A2 --> B2[Task B finishes]
end
subgraph "Parallelism (physical)"
C1[Task A] --> D1[Task A done]
C2[Task B] --> D2[Task B done]
end
Concurrency vs Parallelism
Concurrency is about structure — multiple tasks making progress in overlapping time periods. On a single-core CPU, threads share time slices.
Parallelism is about execution — multiple tasks actually running simultaneously on different cores.
import threading
import time
def task(name, duration):
print(f"[{name}] Starting, will run {duration}s")
time.sleep(duration)
print(f"[{name}] Done")
# Sequential execution
print("=== Sequential (no concurrency) ===")
start = time.time()
task("A", 2)
task("B", 2)
print(f"Total: {time.time() - start:.1f}s\n")
# Concurrent execution (single-core illusion)
print("=== Concurrent (threading) ===")
start = time.time()
t1 = threading.Thread(target=task, args=("A", 2))
t2 = threading.Thread(target=task, args=("B", 2))
t1.start()
t2.start()
t1.join()
t2.join()
print(f"Total: {time.time() - start:.1f}s")Expected output:
=== Sequential (no concurrency) ===
[A] Starting, will run 2s
[A] Done
[B] Starting, will run 2s
[B] Done
Total: 4.0s
=== Concurrent (threading) ===
[A] Starting, will run 2s
[B] Starting, will run 2s
[A] Done
[B] Done
Total: 2.0sThreads vs Processes
| Feature | Thread | Process |
|---|---|---|
| Memory | Shared within same process | Isolated |
| Creation | Fast | Slow |
| Communication | Direct (shared memory) | IPC (pipes, sockets) |
| Crash impact | Can crash whole process | Independent |
| Use case | Fine-grained concurrency | Isolation, separate programs |
Mutexes and Race Conditions
A race condition happens when two threads access shared data without synchronisation. A mutex (mutual exclusion) ensures only one thread accesses the critical section at a time.
import threading
counter = 0
lock = threading.Lock()
def increment_without_lock():
global counter
for _ in range(100000):
counter += 1 # Not atomic!
def increment_with_lock():
global counter
for _ in range(100000):
with lock:
counter += 1
# Without lock — race condition
counter = 0
t1 = threading.Thread(target=increment_without_lock)
t2 = threading.Thread(target=increment_without_lock)
t1.start(); t2.start()
t1.join(); t2.join()
print(f"Without lock: {counter} (expected 200000)")
# With lock — correct
counter = 0
t1 = threading.Thread(target=increment_with_lock)
t2 = threading.Thread(target=increment_with_lock)
t1.start(); t2.start()
t1.join(); t2.join()
print(f"With lock: {counter} (expected 200000)")Expected output:
Without lock: 184721 (expected 200000)
With lock: 200000 (expected 200000)The exact number without lock varies each run — that’s the race condition.
Semaphores
A semaphore controls access to a limited resource pool. A counting semaphore allows N threads to access a resource; a binary semaphore works like a mutex.
import threading
import time
pool = threading.Semaphore(3) # Allow 3 concurrent accesses
def access_database(thread_id):
with pool:
print(f"[Thread {thread_id}] Accessing database...")
time.sleep(1)
print(f"[Thread {thread_id}] Done")
threads = []
for i in range(10):
t = threading.Thread(target=access_database, args=(i,))
threads.append(t)
print("Starting 10 threads with pool limit of 3")
for t in threads:
t.start()
for t in threads:
t.join()Expected output:
Starting 10 threads with pool limit of 3
[Thread 0] Accessing database...
[Thread 1] Accessing database...
[Thread 2] Accessing database...
[Thread 0] Done
[Thread 3] Accessing database...
...At most 3 threads access the resource simultaneously.
Deadlock Prevention
A deadlock occurs when two threads each hold a lock and wait for the other’s lock. Four conditions must hold: mutual exclusion, hold-and-wait, no preemption, circular wait.
Banker’s Algorithm
The banker’s algorithm prevents deadlock by checking if granting a request leaves the system in a safe state.
def is_safe_state(available, allocated, max_need):
n_processes = len(allocated)
n_resources = len(available)
need = [[max_need[i][j] - allocated[i][j]
for j in range(n_resources)]
for i in range(n_processes)]
finished = [False] * n_processes
safe_sequence = []
work = available.copy()
while len(safe_sequence) < n_processes:
found = False
for i in range(n_processes):
if not finished[i] and all(need[i][j] <= work[j]
for j in range(n_resources)):
for j in range(n_resources):
work[j] += allocated[i][j]
finished[i] = True
safe_sequence.append(i)
found = True
if not found:
return False, []
return True, safe_sequence
# Example: 5 processes, 3 resources
available = [3, 3, 2]
allocated = [
[0, 1, 0],
[2, 0, 0],
[3, 0, 2],
[2, 1, 1],
[0, 0, 2],
]
max_need = [
[7, 5, 3],
[3, 2, 2],
[9, 0, 2],
[2, 2, 2],
[4, 3, 3],
]
safe, sequence = is_safe_state(available, allocated, max_need)
print(f"Safe state: {safe}")
if safe:
print(f"Safe sequence: P{', P'.join(map(str, sequence))}")Expected output:
Safe state: True
Safe sequence: P1, P3, P4, P0, P2Lock Ordering
A simpler deadlock prevention strategy: always acquire locks in the same order across all threads.
import threading
import time
lock_a = threading.Lock()
lock_b = threading.Lock()
def thread_1():
with lock_a:
time.sleep(0.1)
with lock_b:
print("Thread 1: got both locks")
def thread_2():
with lock_a: # Same order as thread_1
time.sleep(0.1)
with lock_b:
print("Thread 2: got both locks")
t1 = threading.Thread(target=thread_1)
t2 = threading.Thread(target=thread_2)
t1.start(); t2.start()
t1.join(); t2.join()
print("No deadlock — lock ordering works!")If thread_2 acquired lock_b first, they’d deadlock.
The Dining Philosophers Problem
Five philosophers sit at a table with five forks. Each needs two forks to eat. This classic problem demonstrates deadlock and starvation.
import threading
import time
import random
class Philosopher(threading.Thread):
def __init__(self, index, left_fork, right_fork):
super().__init__()
self.index = index
self.left_fork = left_fork
self.right_fork = right_fork
def run(self):
for _ in range(3):
self.think()
self.eat()
def think(self):
print(f"Philosopher {self.index} thinking...")
time.sleep(random.random())
def eat(self):
# Acquire forks in order of index to prevent deadlock
first = self.left_fork if self.index % 2 == 0 else self.right_fork
second = self.right_fork if self.index % 2 == 0 else self.left_fork
with first:
with second:
print(f"Philosopher {self.index} eating! 🍝")
time.sleep(random.random())
forks = [threading.Lock() for _ in range(5)]
philosophers = [
Philosopher(i, forks[i], forks[(i + 1) % 5])
for i in range(5)
]
for p in philosophers:
p.start()
for p in philosophers:
p.join()
print("All philosophers finished!")The trick: odd-indexed philosophers pick up the right fork first, breaking the circular wait.
Go Goroutines vs OS Threads
Go goroutines are lightweight user-space threads multiplexed onto OS threads by the Go runtime. They start with ~2KB stacks (vs ~1MB for OS threads) and can scale to millions.
package main
import (
"fmt"
"sync"
"time"
)
func worker(id int, wg *sync.WaitGroup) {
defer wg.Done()
fmt.Printf("Worker %d starting\n", id)
time.Sleep(time.Second)
fmt.Printf("Worker %d done\n", id)
}
func main() {
var wg sync.WaitGroup
for i := 1; i <= 5; i++ {
wg.Add(1)
go worker(i, &wg)
}
wg.Wait()
fmt.Println("All workers completed")
}Expected output:
Worker 5 starting
Worker 1 starting
Worker 3 starting
Worker 4 starting
Worker 2 starting
Worker 2 done
Worker 1 done
Worker 5 done
Worker 3 done
Worker 4 done
All workers completedAsync/Await
Async/await enables cooperative concurrency within a single thread. Tasks voluntarily yield control at await points.
import asyncio
async def fetch_data(url, delay):
print(f"Fetching {url}...")
await asyncio.sleep(delay) # Yield control
data = f"Data from {url}"
print(f"Got {len(data)} bytes from {url}")
return data
async def main():
tasks = [
fetch_data("https://api.example.com/users", 2),
fetch_data("https://api.example.com/posts", 1),
fetch_data("https://api.example.com/comments", 3),
]
results = await asyncio.gather(*tasks)
for result in results:
print(f"Result: {result}")
asyncio.run(main())Expected output:
Fetching https://api.example.com/users...
Fetching https://api.example.com/posts...
Fetching https://api.example.com/comments...
Got 29 bytes from https://api.example.com/posts
Got 29 bytes from https://api.example.com/users
Got 32 bytes from https://api.example.com/comments
Result: Data from https://api.example.com/users
Result: Data from https://api.example.com/posts
Result: Data from https://api.example.com/commentsLock-Free Data Structures
Lock-free data structures use atomic operations (like CAS — Compare-And-Swap) instead of mutexes. They avoid deadlocks and priority inversion.
import threading
import atomic # Requires: pip install atomicwrites
# Python's threading provides atomic increment via锁
# but true lock-free structures use CAS
class LockFreeCounter:
def __init__(self):
self._value = 0
self._lock = threading.Lock()
def increment(self):
with self._lock:
self._value += 1
return self._value
def get(self):
return self._valueReal lock-free structures in C++ use std::atomic<int> and compare_exchange_weak. For most application code, well-designed locks are sufficient.
Common Mistakes
1. Not protecting shared state
Accessing a shared list without a lock from multiple threads causes data corruption or crashes.
2. Deadlock from inconsistent lock ordering
Thread A locks X then Y; Thread B locks Y then X → deadlock. Always acquire in a consistent order.
3. Holding locks for too long
Holding a mutex during I/O or network calls blocks other threads. Keep critical sections small.
4. Forgetting to join threads
A program exits before threads finish. Always join threads before the main thread ends.
5. Over-using threads for CPU-bound work
More threads than CPU cores causes context switching overhead. Use os.cpu_count() to limit thread pool size.
6. Using async/await for CPU-heavy code
Async is great for I/O-bound tasks, not CPU-bound work. For CPU-bound, use threads or processes.
Practice Questions
What’s the difference between concurrency and parallelism? Concurrency is about dealing with many tasks at once (structure); parallelism is about executing many tasks at once (hardware).
What four conditions cause deadlock? Mutual exclusion, hold-and-wait, no preemption, circular wait. Break any one to prevent deadlock.
How does the banker’s algorithm prevent deadlock? It checks whether granting a resource request leaves the system in a safe state — one where all processes can eventually complete.
Why are goroutines lighter than OS threads? Goroutines start with ~2KB stacks that grow as needed, while OS threads have ~1MB fixed stacks. Goroutines are multiplexed onto OS threads by the runtime scheduler.
When should you use async/await instead of threads? For I/O-bound work (network, disk, databases) where latency dominates. Use threads or processes for CPU-bound computation.
Challenge
Implement a thread-safe bounded queue (producer-consumer) using a condition variable. The queue should block producers when full and consumers when empty.
Real-World Task
Use Python’s concurrent.futures.ThreadPoolExecutor to download 10 URLs concurrently. Measure the speedup vs sequential downloads. Security angle: malware scanners use this pattern to query multiple threat intelligence feeds in parallel.
FAQ
Mini Project: Concurrent Web Crawler
Build a concurrent web crawler that:
- Takes a list of URLs
- Downloads each URL using a thread pool
- Extracts links from each page
- Discovers new URLs up to a configurable depth
Security angle: Use rate limiting and backoff to avoid overwhelming servers. Add a robots.txt parser to respect crawl policies.
What’s Next
Before moving on, you should understand:
- The difference between threads and processes
- How mutexes and semaphores work
- Deadlock prevention strategies (lock ordering, banker’s algorithm)
- When to use async vs threads
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro