Learn Operating: Virtualization & Containers — Hypervisors, Docker & Kubernetes Pod Isolation

Virtualization & Containers — Hypervisors, Docker & Kubernetes Pod Isolation

DodaTech Updated Jun 20, 2026 10 min read

Virtualization lets you run multiple operating systems on one physical machine. Containers share the host OS kernel while isolating applications — a lightweight alternative to virtual machines.

What You’ll Learn

In this tutorial, you’ll learn the difference between Type 1 and Type 2 hypervisors, full virtualization vs para-virtualization, hardware-assisted virtualization (Intel VT-x, AMD-V), Linux namespaces and cgroups, Docker internals (containerd, runc), container vs VM comparison, and Kubernetes pod isolation.

Why It Matters

Cloud computing runs on virtualization. Every AWS instance, every Kubernetes pod, every serverless function is a virtualised workload. Understanding the layers between your code and the hardware helps you optimise performance, diagnose issues, and design resilient systems.

Real-World Use

When you deploy a Docker container on AWS ECS, it runs inside a VM on a Type 1 hypervisor (Nitro). The container uses Linux namespaces for isolation and cgroups for resource limits. Kubernetes orchestrates hundreds of such containers across a cluster. Durga Antivirus Pro runs scanner containers in isolated environments for safe malware analysis.

    graph TD
  subgraph "Bare Metal"
    HARDWARE[Physical Hardware]
  end
  subgraph "Type 1 Hypervisor"
    HYP1[VMware ESXi / Hyper-V / KVM]
    VM1[VM 1]
    VM2[VM 2]
    VM3[VM 3]
    HYP1 --> HARDWARE
    VM1 --> HYP1
    VM2 --> HYP1
    VM3 --> HYP1
  end
  subgraph "Container Runtime"
    HOST_OS[Host OS with Namespaces & Cgroups]
    DOCKER[Docker Engine]
    CT1[Container 1]
    CT2[Container 2]
    DOCKER --> HOST_OS
    CT1 --> DOCKER
    CT2 --> DOCKER
  end

Hypervisors: Type 1 vs Type 2

Feature	Type 1 (Bare-metal)	Type 2 (Hosted)
Runs directly on	Hardware	Host OS
Examples	VMware ESXi, Hyper-V, KVM, Xen	VirtualBox, VMware Workstation
Performance	Near-native	Some overhead
Use case	Data centres, cloud	Development, testing
Security	Smaller attack surface	Dependent on host OS

class VirtualMachine:
    def __init__(self, name, vcpus=1, memory_mb=1024, disk_gb=20):
        self.name = name
        self.vcpus = vcpus
        self.memory_mb = memory_mb
        self.disk_gb = disk_gb
        self.state = 'powered_off'

    def power_on(self, hypervisor_type):
        if hypervisor_type == 'Type 1':
            print(f'[VM] {self.name}: Booting via hardware passthrough')
        else:
            print(f'[VM] {self.name}: Booting via host OS emulation')
        self.state = 'running'
        print(f'[VM] {self.name}: Assigned {self.vcpus} vCPUs, '
              f'{self.memory_mb}MB RAM, {self.disk_gb}GB disk')

    def __repr__(self):
        return f'VM({self.name}, {self.vcpus}vCPU, {self.memory_mb}MB)'

class Hypervisor:
    def __init__(self, hv_type='Type 1'):
        self.hv_type = hv_type
        self.vms = []

    def create_vm(self, name, **kwargs):
        vm = VirtualMachine(name, **kwargs)
        self.vms.append(vm)
        return vm

    def start_all(self):
        for vm in self.vms:
            vm.power_on(self.hv_type)

    def resources_used(self):
        total_vcpus = sum(vm.vcpus for vm in self.vms)
        total_mem = sum(vm.memory_mb for vm in self.vms)
        return total_vcpus, total_mem

# Type 1: ESXi on bare metal
esxi = Hypervisor('Type 1')
web_vm = esxi.create_vm('web-server', vcpus=2, memory_mb=4096)
db_vm = esxi.create_vm('database', vcpus=4, memory_mb=8192)
esxi.start_all()
cpu, mem = esxi.resources_used()
print(f'\nTotal: {cpu} vCPUs, {mem}MB RAM allocated')

Expected output:

[VM] web-server: Booting via hardware passthrough
[VM] web-server: Assigned 2 vCPUs, 4096MB RAM, 20GB disk
[VM] database: Booting via hardware passthrough
[VM] database: Assigned 4 vCPUs, 8192MB RAM, 20GB disk

Total: 6 vCPUs, 12288MB RAM allocated

Full Virtualization vs Para-Virtualization

Full virtualization: guest OS runs unmodified. Hypervisor handles privileged instructions via binary translation or hardware support (VT-x/AMD-V).

Para-virtualization: guest OS is modified to use hypercalls instead of privileged instructions. Better performance but requires OS changes. Xen pioneered this approach.

Hardware-Assisted Virtualization (VT-x, AMD-V)

Intel VT-x and AMD-V add CPU rings (-1 for hypervisor, 0 for guest OS). The hypervisor doesn’t need binary translation — the CPU hardware handles privileged instruction trapping.

Without VT-x:
Ring 0: Host OS
Ring 3: Apps + Guest OS (binary translated)

With VT-x:
VMX Root: Hypervisor
VMX Non-Root, Ring 0: Guest OS
VMX Non-Root, Ring 3: Guest Apps

Linux Namespaces

Namespaces provide process isolation — each container sees its own view of the system.

Namespace	Isolates
pid	Process IDs
net	Network stack (interfaces, routes)
mnt	Mount points and file system
uts	Hostname and domain
ipc	IPC resources (message queues, shared memory)
user	User and group IDs
cgroup	Control group hierarchy
time	System time

import os

class ContainerNamespace:
    def __init__(self, name):
        self.name = name
        self.pids = set()
        self.network = {
            'interfaces': ['lo', 'eth0'],
            'ip': None,
            'routes': [],
        }
        self.mounts = {'/': 'rootfs', '/proc': 'proc'}
        self.hostname = name

    def add_process(self, pid):
        self.pids.add(pid)

    def __repr__(self):
        return (f'Namespace[{self.name}]: '
                f'pids={len(self.pids)}, '
                f'ip={self.network["ip"]}, '
                f'hostname={self.hostname}')

# Simulate creating namespaced containers
containers = [
    ContainerNamespace('web-app'),
    ContainerNamespace('api-server'),
    ContainerNamespace('background-worker'),
]

for i, ctn in enumerate(containers):
    ctn.add_process(1000 + i)
    ctn.network['ip'] = f'10.0.{i}.2'
    print(ctn)

Expected output:

Namespace[web-app]: pids=1, ip=10.0.0.2, hostname=web-app
Namespace[api-server]: pids=1, ip=10.0.1.2, hostname=api-server
Namespace[background-worker]: pids=1, ip=10.0.2.2, hostname=background-worker

Control Groups (cgroups)

cgroups limit and account for resource usage (CPU, memory, I/O, network).

class CGroup:
    def __init__(self, name, cpu_shares=1024, memory_limit_mb=512):
        self.name = name
        self.cpu_shares = cpu_shares
        self.memory_limit_mb = memory_limit_mb
        self.current_memory_mb = 0
        self.cpu_usage_percent = 0

    def set_limit(self, resource, limit):
        if resource == 'memory':
            self.memory_limit_mb = limit
        elif resource == 'cpu':
            self.cpu_shares = limit

    def account_memory(self, mb):
        self.current_memory_mb += mb
        if self.current_memory_mb > self.memory_limit_mb:
            print(f'[OOM] Container {self.name} exceeded '
                  f'memory limit ({self.memory_limit_mb}MB)!')
            return False
        return True

    def __repr__(self):
        return (f'cgroup[{self.name}]: '
                f'cpu={self.cpu_shares}, '
                f'mem={self.current_memory_mb}/{self.memory_limit_mb}MB')

# Simulate cgroup limits
web_cgroup = CGroup('web', cpu_shares=2048, memory_limit_mb=1024)
for i in range(5):
    ok = web_cgroup.account_memory(256)
    print(f'Allocated 256MB: {ok} — {web_cgroup.current_memory_mb}MB used')
    if not ok:
        break

Expected output:

Allocated 256MB: True — 256MB used
Allocated 256MB: True — 512MB used
Allocated 256MB: True — 768MB used
Allocated 256MB: True — 1024MB used
Allocated 256MB: OOM! Container web exceeded memory limit (1024MB)!
Allocated 256MB: False — 1024MB used

Docker Internals (containerd, runc)

Docker uses a layered architecture:

docker CLI → dockerd → containerd → runc

dockerd: high-level daemon, manages images, volumes, networks, compose
containerd: container runtime, manages containers lifecycle (OCI-compliant)
runc: low-level runtime, creates and runs containers using Linux namespaces + cgroups
containerd-shim: keeps container alive when dockerd restarts

import time

class OCIProcess:
    """Simulate an OCI-compliant container process"""

    def __init__(self, pid, rootfs, command):
        self.pid = pid
        self.rootfs = rootfs
        self.command = command
        self.state = 'created'

    def start(self):
        print(f'[runc] Creating container (pid={self.pid})')
        print(f'[runc] rootfs: {self.rootfs}')
        print(f'[runc] Namespaces: pid,net,mnt,uts,ipc')
        print(f'[runc] cgroups: cpu=1024, memory=512MB')
        self.state = 'running'
        print(f'[runc] Started PID {self.pid}: {self.command}')
        return True

    def stop(self, timeout=10):
        print(f'[runc] Sending SIGTERM to PID {self.pid}')
        self.state = 'stopped'
        print(f'[runc] Container stopped')

class Containerd:
    def __init__(self):
        self.containers = {}

    def create_container(self, image, command, container_id):
        print(f'[containerd] Pulling image: {image}')
        time.sleep(0.1)
        print(f'[containerd] Creating container instance: {container_id}')
        pid = hash(container_id) & 0xFFFF
        proc = OCIProcess(pid, f'/var/lib/containerd/{container_id}', command)
        self.containers[container_id] = proc
        return proc

class DockerEngine:
    def __init__(self):
        self.containerd = Containerd()

    def run(self, image, command, name=None):
        container_id = name or f'container_{int(time.time())}'
        print(f'[docker] docker run {image} {command}')
        proc = self.containerd.create_container(image, command, container_id)
        proc.start()
        return proc

engine = DockerEngine()
container = engine.run('nginx:alpine', 'nginx -g "daemon off;"', name='web')
print()
container.stop()

Expected output:

[docker] docker run nginx:alpine nginx -g "daemon off;"
[containerd] Pulling image: nginx:alpine
[containerd] Creating container instance: web
[runc] Creating container (pid=34952)
[runc] rootfs: /var/lib/containerd/web
[runc] Namespaces: pid,net,mnt,uts,ipc
[runc] cgroups: cpu=1024, memory=512MB
[runc] Started PID 34952: nginx -g "daemon off;"
[runc] Sending SIGTERM to PID 34952
[runc] Container stopped

Container vs VM Comparison

Feature	Container	Virtual Machine
OS	Shares host kernel	Full OS per VM
Boot time	Milliseconds	Minutes
Size	MBs	GBs
Isolation	Namespace/cgroup boundary	Hardware virtualisation
Security	Kernel shared (less isolated)	Strong (separate kernel)
Density	Hundreds per host	Tens per host
Performance	Near-native	95-98% of native
Migration	Limited	Full VM live migration

Kubernetes Pod Isolation

Kubernetes schedules pods — groups of containers sharing the same network namespace, IP address, and volume mounts.

class Pod:
    def __init__(self, name, namespace='default'):
        self.name = name
        self.namespace = namespace
        self.containers = []
        self.pod_ip = None
        self.status = 'Pending'

    def add_container(self, name, image, ports=None, resources=None):
        container = {
            'name': name,
            'image': image,
            'ports': ports or [],
            'resources': resources or {'cpu': '100m', 'memory': '128Mi'},
        }
        self.containers.append(container)
        return container

    def assign_ip(self, cni_plugin):
        ip = cni_plugin.allocate_ip(self)
        self.pod_ip = ip
        return ip

    def start(self):
        print(f'[K8s] Pod {self.name}/{self.namespace} scheduled to node-1')
        print(f'[K8s] Pod IP: {self.pod_ip}')
        for c in self.containers:
            print(f'[K8s]   Container {c["name"]}: {c["image"]}')
            print(f'[K8s]   Ports: {c["ports"]}')
            print(f'[K8s]   Limits: {c["resources"]}')
        self.status = 'Running'
        print(f'[K8s] Pod status: {self.status}')

class CNIPlugin:
    def __init__(self):
        self.allocated = set()

    def allocate_ip(self, pod):
        base = 10
        while base in self.allocated:
            base += 1
        ip = f'10.42.{base // 256}.{base % 256}'
        self.allocated.add(base)
        return ip

cni = CNIPlugin()
pod = Pod('web-app', 'production')
pod.add_container('nginx', 'nginx:1.25',
                  ports=[{'containerPort': 80}],
                  resources={'cpu': '500m', 'memory': '256Mi'})
pod.add_container('sidecar', 'fluentd:v1.16',
                  resources={'cpu': '100m', 'memory': '128Mi'})
pod.assign_ip(cni)
pod.start()

Expected output:

[K8s] Pod web-app/production scheduled to node-1
[K8s] Pod IP: 10.42.0.0
[K8s]   Container nginx: nginx:1.25
[K8s]   Ports: [{'containerPort': 80}]
[K8s]   Limits: {'cpu': '500m', 'memory': '256Mi'}
[K8s]   Container sidecar: fluentd:v1.16
[K8s]   Ports: []
[K8s]   Limits: {'cpu': '100m', 'memory': '128Mi'}
[K8s] Pod status: Running

Common Mistakes

1. Thinking containers are lightweight VMs

Containers share the host kernel. A container can’t run a different OS kernel (Windows containers on Windows, Linux on Linux). VMs can run any OS.

2. Running containers as root

Containers running as root inside the container often run as root on the host. Use USER directive in Dockerfiles, run with --user, and enable user namespace remapping.

3. Not setting resource limits (cgroups)

Without memory limits, a single container can exhaust host memory and OOM-kill critical processes. Always set --memory and --cpus limits.

4. Over-provisioning VMs

Assigning more vCPUs than physical cores leads to CPU scheduling overhead. Right-size VMs based on actual resource usage.

5. Ignoring the noisy neighbour problem

One VM/container consuming all I/O affects all neighbours. Use I/O throttling (--blkio-weight for Docker, disk IOPS limits for VMs).

6. Not securing the container supply chain

A compromised base image infects all containers built from it. Use minimal base images (Alpine, distroless), scan with Trivy or Clair, and sign images.

Practice Questions

What’s the main difference between Type 1 and Type 2 hypervisors? Type 1 runs directly on hardware (bare-metal). Type 2 runs on top of a host OS. Type 1 is used in production; Type 2 for development.
What Linux namespaces does Docker use? pid, net, mnt, uts, ipc, user, cgroup, time. Each isolates a different global system resource.
What is the difference between containerd and runc? containerd is a high-level runtime managing images and container lifecycle. runc is a low-level OCI runtime that creates containers using namespaces and cgroups.
Why can’t a Linux container run a Windows application? Containers share the host kernel. Linux containers use the Linux kernel; Windows containers use the Windows kernel. You can’t run Windows binaries on a Linux kernel.
How does Kubernetes isolate pods? Each pod gets its own network namespace (unique IP), shared by all containers in the pod. Resource limits use cgroups. Pods are scheduled to nodes; the kubelet enforces isolation.

Challenge

Create a shell script that demonstrates namespace isolation using unshare. Create a process with its own PID, mount, and UTS namespace. Show that it has its own hostname and process list. Then use nsenter to enter the namespace and observe.

Real-World Task

On a Linux system, inspect the cgroups for a running Docker container: docker stats <container> and cat /sys/fs/cgroup/memory/docker/<container-id>/memory.current. Run docker inspect <container> and examine the network settings and mounts.

FAQ

What is the difference between Docker and Kubernetes?

Docker packages and runs individual containers. Kubernetes orchestrates containers across a cluster — scheduling, scaling, service discovery, rolling updates, and self-healing.

Can a container outgrow its memory limit?

If a container exceeds its memory limit, the kernel’s OOM killer terminates processes inside it. Docker supports swap to allow exceeding memory (up to swap limit).

What is a sidecar container?

A sidecar is an additional container in the same pod that provides supporting functionality (logging, monitoring, proxying) without modifying the main application container.

What is live migration of a VM?

Moving a running VM from one physical host to another with no downtime. Memory pages are copied incrementally; the final switchover takes milliseconds.

Is there performance overhead from namespaces?

Namespaces add negligible CPU overhead. Cgroups add slight overhead for accounting. The main overhead comes from filesystem layers (CoW) and network indirection (bridge, overlay).

Mini Project: Container Runtime Simulator

Build a simulated container runtime that:

Creates “containers” using Python objects representing namespaces
Implements basic cgroup-like resource tracking (CPU shares, memory limit)
“Runs” a process inside each container
Reports resource usage per container

Security angle: Container escape vulnerabilities allow a process to break out of namespace isolation. Always drop capabilities in containers, use seccomp profiles, and don’t mount the Docker socket inside containers.

What’s Next

Review: Process Scheduling

Review: File Systems

Review: Interprocess Communication

Before moving on, you should understand:

Type 1 vs Type 2 hypervisors and when to use each
Full vs para-virtualization and hardware-assisted virtualization
Linux namespaces and cgroups for container isolation
Docker’s layered architecture (dockerd → containerd → runc)
Container vs VM trade-offs

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.

Previous Interprocess Communication — Pipes, Message Queues, Shared Memory & Sockets

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Operating Systems