Skip to content
Erlang: Building Concurrent, Fault-Tolerant Systems

Erlang: Building Concurrent, Fault-Tolerant Systems

DodaTech Updated Jun 20, 2026 6 min read

Erlang is a programming language designed for one purpose: building systems that never stop. Created at Ericsson in 1986, it powers telecommunications switches that run for decades without downtime. Its actor-model concurrency and “let it crash” philosophy have influenced languages like Elixir, Go, and Rust.

In this tutorial, you’ll learn Erlang’s history, the BEAM VM architecture, processes and message passing, OTP (Open Telecom Platform), supervision trees, and the let-it-crash philosophy.

What You’ll Learn

  • Erlang’s origin at Ericsson and its problem domain
  • BEAM VM: process scheduling, memory isolation, hot code swapping
  • Processes: spawning, linking, monitoring
  • Message passing: send, receive, selective receive
  • OTP: GenServer, Supervisor, Application
  • Let-it-crash philosophy

Why Erlang Matters

Erlang powers systems requiring 99.999% uptime: telecom switches (Ericsson), messaging (WhatsApp handles 2M+ connections per server), databases (CouchDB, Riak), and rabbitMQ. At DodaTech, the fault-tolerance patterns in Erlang inspire the crash-recovery mechanisms in Durga Antivirus Pro.

Learning Path

    flowchart LR
  A[Concurrency Concepts] --> B[Erlang Basics<br/>You are here]
  B --> C[OTP & Supervisors]
  C --> D[Distributed Erlang]
  B --> E[Elixir — Next Step]
  style B fill:#f90,color:#fff
  

The BEAM VM

The BEAM is Erlang’s virtual machine — a soft real-time runtime that manages processes, memory, and scheduling:

FeatureWhat It Enables
Preemptive schedulingNo process can starve others
Per-process GCOne process’s GC doesn’t pause others
Memory isolationA crash in one process doesn’t affect others
Hot code swappingUpdate code without restarting

Erlang processes are not OS threads. They’re lightweight (300 bytes each) and managed entirely by the BEAM. A single server can run millions of processes.

Processes and Message Passing

Processes are the fundamental unit of concurrency:

% Spawn a process
-module(greeter).
-export([start/0, loop/0]).

start() ->
    Pid = spawn(?MODULE, loop, []),
    Pid ! {hello, self()},  % Send message
    receive
        {reply, Msg} ->
            io:format("Got: ~p~n", [Msg])
    end.

loop() ->
    receive
        {hello, From} ->
            From ! {reply, <<"Hello back!">>},
            loop();
        stop ->
            ok
    end.

Expected behavior: The greeter process receives {hello, Pid}, sends a reply, and loops back to wait for the next message. The receive block selectively picks messages from the mailbox.

Selective Receive

Erlang’s receive scans the mailbox in order, picking the first matching message:

% Selective receive — only matches our reply
test() ->
    Pid = spawn(fun() ->
        timer:sleep(100),    % Simulate work
        {self(), response}   % Send a tagged tuple
    end),
    % Send unrelated messages first
    self() ! {other, noise},
    self() ! {other, more_noise},
    % Only matches {Pid, response}
    receive
        {Pid, response} -> ok
    after 1000 ->
        timeout
    end.

Messages not matching the pattern remain in the mailbox for future receives. This enables complex protocol handling.

The Let-It-Crash Philosophy

Traditional programming tries to handle every error. Erlang’s philosophy: let processes crash, let supervisors restart them.

% Supervisor definition
-module(my_supervisor).
-behaviour(supervisor).

-export([init/1]).

init(_Args) ->
    ChildSpec = #{
        id => my_worker,
        start => {my_worker, start_link, []},
        restart => permanent,  % Always restart
        shutdown => 5000,
        type => worker,
        modules => [my_worker]
    },
    {ok, {{one_for_one, 5, 10}, [ChildSpec]}}.

% If my_worker crashes:
% 1. Supervisor receives EXIT signal
% 2. Supervisor restarts worker (permanent = always)
% 3. If 5 crashes in 10 seconds, supervisor stops trying
% Worker that crashes safely
-module(my_worker).
-behaviour(gen_server).

start_link() ->
    gen_server:start_link({local, ?MODULE}, ?MODULE, [], []).

init([]) ->
    {ok, #{state => initial}}.

handle_call({risky_operation, Data}, _From, State) ->
    % If this crashes, the supervisor restarts us
    % A linked process is notified via EXIT signal
    Result = do_risky_thing(Data),
    {reply, Result, State}.

Expected behavior: When a worker crashes, the supervisor logs it and starts a new one. If crashes happen too frequently, the supervisor stops trying (crash threshold).

OTP Behaviours

OTP provides battle-tested patterns for common server patterns:

BehaviourPurposeAnalogy
gen_serverStateful serverLike a class instance
gen_statemState machineProtocol handling
gen_eventEvent handlingObserver pattern
supervisorProcess lifecycleParent managing children
applicationSystem startupEntry point
% GenServer example — counter
-module(counter).
-behaviour(gen_server).

% API
-export([start_link/1, get/1, increment/1]).
% Callbacks
-export([init/1, handle_call/3, handle_cast/2]).

start_link(Initial) ->
    gen_server:start_link({local, ?MODULE}, ?MODULE, Initial, []).

get(Pid) ->
    gen_server:call(Pid, get).

increment(Pid) ->
    gen_server:cast(Pid, increment).

init(Initial) ->
    {ok, Initial}.

handle_call(get, _From, State) ->
    {reply, State, State}.

handle_cast(increment, State) ->
    {noreply, State + 1}.

Expected behavior: The counter stores state. get/1 reads it synchronously (call). increment/1 modifies it asynchronously (cast).

Common Mistakes

1. Ignoring Links and Monitors

Spawning a process without linking means you won’t know if it crashes. Use spawn_link or monitor to track process health.

2. Making Processes Too Heavy

Each process should do one small thing. A process that handles authentication, database queries, AND logging is too coupled.

3. Pattern Matching Without Catch-All Guards

Always handle unexpected messages:

receive
    {expected, Data} -> handle(Data);
    Other -> log_unexpected(Other), loop()
end.

4. Blocking in a receive Loop

A process that performs long I/O in its receive loop blocks its mailbox. Spawn a separate worker for blocking operations.

5. Forgetting to Set Process Flags

process_flag(trap_exit, true) must be set before linking if you want to handle exits instead of crashing.

6. Not Using OTP Behaviours

Bare receive loops are error-prone. Use gen_server, gen_statem, etc. for robust, tested patterns.

Practice Questions

1. What is a process in Erlang?

A lightweight unit of concurrency managed by the BEAM VM. Processes share nothing and communicate via message passing.

2. What is selective receive?

receive scans the mailbox in order and picks the first message matching the current pattern. Non-matching messages stay in the mailbox.

3. What does “let it crash” mean?

Don’t try to handle every error in a process. Let it crash and let a supervisor restart it — this produces more reliable systems.

4. What is a supervisor?

An OTP process that monitors child processes and restarts them according to a restart strategy (one_for_one, one_for_all, etc.).

5. Challenge: Build a key-value store GenServer.

Implement a GenServer with put(Key, Value) and get(Key) operations. Store data in a map. Handle the case where the key doesn’t exist.

Mini Project: Process Ring

Build N processes connected in a ring. Send a message around the ring M times and measure the time:

-module(ring).
-export([start/2]).

start(N, M) ->
    % Create ring of N processes
    Pids = [spawn(fun() -> loop() end) || _ <- lists:seq(1, N)],
    % Link them in a ring
    lists:foldl(fun(Pid, Prev) ->
        Pid ! {set_next, Prev}, Prev
    end, hd(Pids), tl(Pids)),
    % Start the message
    hd(Pids) ! {pass, M, self()},
    receive {done, Time} -> Time end.

loop() ->
    receive
        {set_next, Next} -> loop(Next);
        {pass, 0, From} -> From ! {done, erlang:monotonic_time()};
        {pass, Count, From} ->
            timer:sleep(10),  % simulate work
            Next ! {pass, Count - 1, From},
            loop()
    end.

FAQ

Is Erlang still relevant in 2026?
Yes. Erlang’s concurrency model is still state-of-the-art for real-time systems. WhatsApp, Discord, and RabbitMQ all run Erlang in production.
Should I learn Erlang or Elixir?
Learn Elixir first for practical projects. Learn Erlang to understand the VM deeply. Both compile to the same bytecode.
What does OTP stand for?
Open Telecom Platform — a set of libraries and design principles for building fault-tolerant systems.

What’s Next

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro