Virtualization & Containers — Hypervisors, Docker & Kubernetes Pod Isolation
Virtualization lets you run multiple operating systems on one physical machine. Containers share the host OS kernel while isolating applications — a lightweight alternative to virtual machines.
What You’ll Learn
In this tutorial, you’ll learn the difference between Type 1 and Type 2 hypervisors, full virtualization vs para-virtualization, hardware-assisted virtualization (Intel VT-x, AMD-V), Linux namespaces and cgroups, Docker internals (containerd, runc), container vs VM comparison, and Kubernetes pod isolation.
Why It Matters
Cloud computing runs on virtualization. Every AWS instance, every Kubernetes pod, every serverless function is a virtualised workload. Understanding the layers between your code and the hardware helps you optimise performance, diagnose issues, and design resilient systems.
Real-World Use
When you deploy a Docker container on AWS ECS, it runs inside a VM on a Type 1 hypervisor (Nitro). The container uses Linux namespaces for isolation and cgroups for resource limits. Kubernetes orchestrates hundreds of such containers across a cluster. Durga Antivirus Pro runs scanner containers in isolated environments for safe malware analysis.
graph TD
subgraph "Bare Metal"
HARDWARE[Physical Hardware]
end
subgraph "Type 1 Hypervisor"
HYP1[VMware ESXi / Hyper-V / KVM]
VM1[VM 1]
VM2[VM 2]
VM3[VM 3]
HYP1 --> HARDWARE
VM1 --> HYP1
VM2 --> HYP1
VM3 --> HYP1
end
subgraph "Container Runtime"
HOST_OS[Host OS with Namespaces & Cgroups]
DOCKER[Docker Engine]
CT1[Container 1]
CT2[Container 2]
DOCKER --> HOST_OS
CT1 --> DOCKER
CT2 --> DOCKER
end
Hypervisors: Type 1 vs Type 2
| Feature | Type 1 (Bare-metal) | Type 2 (Hosted) |
|---|---|---|
| Runs directly on | Hardware | Host OS |
| Examples | VMware ESXi, Hyper-V, KVM, Xen | VirtualBox, VMware Workstation |
| Performance | Near-native | Some overhead |
| Use case | Data centres, cloud | Development, testing |
| Security | Smaller attack surface | Dependent on host OS |
class VirtualMachine:
def __init__(self, name, vcpus=1, memory_mb=1024, disk_gb=20):
self.name = name
self.vcpus = vcpus
self.memory_mb = memory_mb
self.disk_gb = disk_gb
self.state = 'powered_off'
def power_on(self, hypervisor_type):
if hypervisor_type == 'Type 1':
print(f'[VM] {self.name}: Booting via hardware passthrough')
else:
print(f'[VM] {self.name}: Booting via host OS emulation')
self.state = 'running'
print(f'[VM] {self.name}: Assigned {self.vcpus} vCPUs, '
f'{self.memory_mb}MB RAM, {self.disk_gb}GB disk')
def __repr__(self):
return f'VM({self.name}, {self.vcpus}vCPU, {self.memory_mb}MB)'
class Hypervisor:
def __init__(self, hv_type='Type 1'):
self.hv_type = hv_type
self.vms = []
def create_vm(self, name, **kwargs):
vm = VirtualMachine(name, **kwargs)
self.vms.append(vm)
return vm
def start_all(self):
for vm in self.vms:
vm.power_on(self.hv_type)
def resources_used(self):
total_vcpus = sum(vm.vcpus for vm in self.vms)
total_mem = sum(vm.memory_mb for vm in self.vms)
return total_vcpus, total_mem
# Type 1: ESXi on bare metal
esxi = Hypervisor('Type 1')
web_vm = esxi.create_vm('web-server', vcpus=2, memory_mb=4096)
db_vm = esxi.create_vm('database', vcpus=4, memory_mb=8192)
esxi.start_all()
cpu, mem = esxi.resources_used()
print(f'\nTotal: {cpu} vCPUs, {mem}MB RAM allocated')Expected output:
[VM] web-server: Booting via hardware passthrough
[VM] web-server: Assigned 2 vCPUs, 4096MB RAM, 20GB disk
[VM] database: Booting via hardware passthrough
[VM] database: Assigned 4 vCPUs, 8192MB RAM, 20GB disk
Total: 6 vCPUs, 12288MB RAM allocatedFull Virtualization vs Para-Virtualization
Full virtualization: guest OS runs unmodified. Hypervisor handles privileged instructions via binary translation or hardware support (VT-x/AMD-V).
Para-virtualization: guest OS is modified to use hypercalls instead of privileged instructions. Better performance but requires OS changes. Xen pioneered this approach.
Hardware-Assisted Virtualization (VT-x, AMD-V)
Intel VT-x and AMD-V add CPU rings (-1 for hypervisor, 0 for guest OS). The hypervisor doesn’t need binary translation — the CPU hardware handles privileged instruction trapping.
Without VT-x:
Ring 0: Host OS
Ring 3: Apps + Guest OS (binary translated)
With VT-x:
VMX Root: Hypervisor
VMX Non-Root, Ring 0: Guest OS
VMX Non-Root, Ring 3: Guest AppsLinux Namespaces
Namespaces provide process isolation — each container sees its own view of the system.
| Namespace | Isolates |
|---|---|
| pid | Process IDs |
| net | Network stack (interfaces, routes) |
| mnt | Mount points and file system |
| uts | Hostname and domain |
| ipc | IPC resources (message queues, shared memory) |
| user | User and group IDs |
| cgroup | Control group hierarchy |
| time | System time |
import os
class ContainerNamespace:
def __init__(self, name):
self.name = name
self.pids = set()
self.network = {
'interfaces': ['lo', 'eth0'],
'ip': None,
'routes': [],
}
self.mounts = {'/': 'rootfs', '/proc': 'proc'}
self.hostname = name
def add_process(self, pid):
self.pids.add(pid)
def __repr__(self):
return (f'Namespace[{self.name}]: '
f'pids={len(self.pids)}, '
f'ip={self.network["ip"]}, '
f'hostname={self.hostname}')
# Simulate creating namespaced containers
containers = [
ContainerNamespace('web-app'),
ContainerNamespace('api-server'),
ContainerNamespace('background-worker'),
]
for i, ctn in enumerate(containers):
ctn.add_process(1000 + i)
ctn.network['ip'] = f'10.0.{i}.2'
print(ctn)Expected output:
Namespace[web-app]: pids=1, ip=10.0.0.2, hostname=web-app
Namespace[api-server]: pids=1, ip=10.0.1.2, hostname=api-server
Namespace[background-worker]: pids=1, ip=10.0.2.2, hostname=background-workerControl Groups (cgroups)
cgroups limit and account for resource usage (CPU, memory, I/O, network).
class CGroup:
def __init__(self, name, cpu_shares=1024, memory_limit_mb=512):
self.name = name
self.cpu_shares = cpu_shares
self.memory_limit_mb = memory_limit_mb
self.current_memory_mb = 0
self.cpu_usage_percent = 0
def set_limit(self, resource, limit):
if resource == 'memory':
self.memory_limit_mb = limit
elif resource == 'cpu':
self.cpu_shares = limit
def account_memory(self, mb):
self.current_memory_mb += mb
if self.current_memory_mb > self.memory_limit_mb:
print(f'[OOM] Container {self.name} exceeded '
f'memory limit ({self.memory_limit_mb}MB)!')
return False
return True
def __repr__(self):
return (f'cgroup[{self.name}]: '
f'cpu={self.cpu_shares}, '
f'mem={self.current_memory_mb}/{self.memory_limit_mb}MB')
# Simulate cgroup limits
web_cgroup = CGroup('web', cpu_shares=2048, memory_limit_mb=1024)
for i in range(5):
ok = web_cgroup.account_memory(256)
print(f'Allocated 256MB: {ok} — {web_cgroup.current_memory_mb}MB used')
if not ok:
breakExpected output:
Allocated 256MB: True — 256MB used
Allocated 256MB: True — 512MB used
Allocated 256MB: True — 768MB used
Allocated 256MB: True — 1024MB used
Allocated 256MB: OOM! Container web exceeded memory limit (1024MB)!
Allocated 256MB: False — 1024MB usedDocker Internals (containerd, runc)
Docker uses a layered architecture:
docker CLI → dockerd → containerd → runc- dockerd: high-level daemon, manages images, volumes, networks, compose
- containerd: container runtime, manages containers lifecycle (OCI-compliant)
- runc: low-level runtime, creates and runs containers using Linux namespaces + cgroups
- containerd-shim: keeps container alive when dockerd restarts
import time
class OCIProcess:
"""Simulate an OCI-compliant container process"""
def __init__(self, pid, rootfs, command):
self.pid = pid
self.rootfs = rootfs
self.command = command
self.state = 'created'
def start(self):
print(f'[runc] Creating container (pid={self.pid})')
print(f'[runc] rootfs: {self.rootfs}')
print(f'[runc] Namespaces: pid,net,mnt,uts,ipc')
print(f'[runc] cgroups: cpu=1024, memory=512MB')
self.state = 'running'
print(f'[runc] Started PID {self.pid}: {self.command}')
return True
def stop(self, timeout=10):
print(f'[runc] Sending SIGTERM to PID {self.pid}')
self.state = 'stopped'
print(f'[runc] Container stopped')
class Containerd:
def __init__(self):
self.containers = {}
def create_container(self, image, command, container_id):
print(f'[containerd] Pulling image: {image}')
time.sleep(0.1)
print(f'[containerd] Creating container instance: {container_id}')
pid = hash(container_id) & 0xFFFF
proc = OCIProcess(pid, f'/var/lib/containerd/{container_id}', command)
self.containers[container_id] = proc
return proc
class DockerEngine:
def __init__(self):
self.containerd = Containerd()
def run(self, image, command, name=None):
container_id = name or f'container_{int(time.time())}'
print(f'[docker] docker run {image} {command}')
proc = self.containerd.create_container(image, command, container_id)
proc.start()
return proc
engine = DockerEngine()
container = engine.run('nginx:alpine', 'nginx -g "daemon off;"', name='web')
print()
container.stop()Expected output:
[docker] docker run nginx:alpine nginx -g "daemon off;"
[containerd] Pulling image: nginx:alpine
[containerd] Creating container instance: web
[runc] Creating container (pid=34952)
[runc] rootfs: /var/lib/containerd/web
[runc] Namespaces: pid,net,mnt,uts,ipc
[runc] cgroups: cpu=1024, memory=512MB
[runc] Started PID 34952: nginx -g "daemon off;"
[runc] Sending SIGTERM to PID 34952
[runc] Container stoppedContainer vs VM Comparison
| Feature | Container | Virtual Machine |
|---|---|---|
| OS | Shares host kernel | Full OS per VM |
| Boot time | Milliseconds | Minutes |
| Size | MBs | GBs |
| Isolation | Namespace/cgroup boundary | Hardware virtualisation |
| Security | Kernel shared (less isolated) | Strong (separate kernel) |
| Density | Hundreds per host | Tens per host |
| Performance | Near-native | 95-98% of native |
| Migration | Limited | Full VM live migration |
Kubernetes Pod Isolation
Kubernetes schedules pods — groups of containers sharing the same network namespace, IP address, and volume mounts.
class Pod:
def __init__(self, name, namespace='default'):
self.name = name
self.namespace = namespace
self.containers = []
self.pod_ip = None
self.status = 'Pending'
def add_container(self, name, image, ports=None, resources=None):
container = {
'name': name,
'image': image,
'ports': ports or [],
'resources': resources or {'cpu': '100m', 'memory': '128Mi'},
}
self.containers.append(container)
return container
def assign_ip(self, cni_plugin):
ip = cni_plugin.allocate_ip(self)
self.pod_ip = ip
return ip
def start(self):
print(f'[K8s] Pod {self.name}/{self.namespace} scheduled to node-1')
print(f'[K8s] Pod IP: {self.pod_ip}')
for c in self.containers:
print(f'[K8s] Container {c["name"]}: {c["image"]}')
print(f'[K8s] Ports: {c["ports"]}')
print(f'[K8s] Limits: {c["resources"]}')
self.status = 'Running'
print(f'[K8s] Pod status: {self.status}')
class CNIPlugin:
def __init__(self):
self.allocated = set()
def allocate_ip(self, pod):
base = 10
while base in self.allocated:
base += 1
ip = f'10.42.{base // 256}.{base % 256}'
self.allocated.add(base)
return ip
cni = CNIPlugin()
pod = Pod('web-app', 'production')
pod.add_container('nginx', 'nginx:1.25',
ports=[{'containerPort': 80}],
resources={'cpu': '500m', 'memory': '256Mi'})
pod.add_container('sidecar', 'fluentd:v1.16',
resources={'cpu': '100m', 'memory': '128Mi'})
pod.assign_ip(cni)
pod.start()Expected output:
[K8s] Pod web-app/production scheduled to node-1
[K8s] Pod IP: 10.42.0.0
[K8s] Container nginx: nginx:1.25
[K8s] Ports: [{'containerPort': 80}]
[K8s] Limits: {'cpu': '500m', 'memory': '256Mi'}
[K8s] Container sidecar: fluentd:v1.16
[K8s] Ports: []
[K8s] Limits: {'cpu': '100m', 'memory': '128Mi'}
[K8s] Pod status: RunningCommon Mistakes
1. Thinking containers are lightweight VMs
Containers share the host kernel. A container can’t run a different OS kernel (Windows containers on Windows, Linux on Linux). VMs can run any OS.
2. Running containers as root
Containers running as root inside the container often run as root on the host. Use USER directive in Dockerfiles, run with --user, and enable user namespace remapping.
3. Not setting resource limits (cgroups)
Without memory limits, a single container can exhaust host memory and OOM-kill critical processes. Always set --memory and --cpus limits.
4. Over-provisioning VMs
Assigning more vCPUs than physical cores leads to CPU scheduling overhead. Right-size VMs based on actual resource usage.
5. Ignoring the noisy neighbour problem
One VM/container consuming all I/O affects all neighbours. Use I/O throttling (--blkio-weight for Docker, disk IOPS limits for VMs).
6. Not securing the container supply chain
A compromised base image infects all containers built from it. Use minimal base images (Alpine, distroless), scan with Trivy or Clair, and sign images.
Practice Questions
What’s the main difference between Type 1 and Type 2 hypervisors? Type 1 runs directly on hardware (bare-metal). Type 2 runs on top of a host OS. Type 1 is used in production; Type 2 for development.
What Linux namespaces does Docker use? pid, net, mnt, uts, ipc, user, cgroup, time. Each isolates a different global system resource.
What is the difference between containerd and runc? containerd is a high-level runtime managing images and container lifecycle. runc is a low-level OCI runtime that creates containers using namespaces and cgroups.
Why can’t a Linux container run a Windows application? Containers share the host kernel. Linux containers use the Linux kernel; Windows containers use the Windows kernel. You can’t run Windows binaries on a Linux kernel.
How does Kubernetes isolate pods? Each pod gets its own network namespace (unique IP), shared by all containers in the pod. Resource limits use cgroups. Pods are scheduled to nodes; the kubelet enforces isolation.
Challenge
Create a shell script that demonstrates namespace isolation using unshare. Create a process with its own PID, mount, and UTS namespace. Show that it has its own hostname and process list. Then use nsenter to enter the namespace and observe.
Real-World Task
On a Linux system, inspect the cgroups for a running Docker container: docker stats <container> and cat /sys/fs/cgroup/memory/docker/<container-id>/memory.current. Run docker inspect <container> and examine the network settings and mounts.
FAQ
Mini Project: Container Runtime Simulator
Build a simulated container runtime that:
- Creates “containers” using Python objects representing namespaces
- Implements basic cgroup-like resource tracking (CPU shares, memory limit)
- “Runs” a process inside each container
- Reports resource usage per container
Security angle: Container escape vulnerabilities allow a process to break out of namespace isolation. Always drop capabilities in containers, use seccomp profiles, and don’t mount the Docker socket inside containers.
What’s Next
Before moving on, you should understand:
- Type 1 vs Type 2 hypervisors and when to use each
- Full vs para-virtualization and hardware-assisted virtualization
- Linux namespaces and cgroups for container isolation
- Docker’s layered architecture (dockerd → containerd → runc)
- Container vs VM trade-offs
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro