Database Replication: Setup and Management Guide
Database replication is the process of copying data from one database server to another to improve read scalability, provide high availability, and enable disaster recovery — using topologies like leader-follower, multi-leader, and synchronous or asynchronous replication.
What You’ll Learn
You’ll understand leader-follower and multi-leader topologies, synchronous vs asynchronous tradeoffs, read replica configuration, failover procedures, replication lag detection, conflict resolution strategies, and monitoring with tools like Patroni and Orchestrator.
Why Replication Matters
A single database server is a single point of failure. If it goes down, your entire application goes down. Doda Browser serves thousands of users who expect 24/7 availability. Replication ensures that if one database fails, another takes over instantly, and read-heavy workloads are distributed across multiple servers for better performance.
Database Replication Learning Path
flowchart LR
A[Database Design] --> B[Database Indexing]
B --> C[Database Migrations]
C --> D[Database Replication]
D --> E[Backup and Restore]
D:::current
classDef current fill:#f90,color:#fff,stroke:#333,stroke-width:2px
Replication Topologies
Leader-Follower (Primary-Replica)
The most common topology. One leader handles writes; followers replicate data for reads.
flowchart TD
Client -->|Writes| Leader[(Leader)]
Leader -->|Async/Sync Replication| Follower1[(Follower 1)]
Leader -->|Async/Sync Replication| Follower2[(Follower 2)]
Leader -->|Async/Sync Replication| Follower3[(Follower 3)]
Client -->|Reads| Follower1
Client -->|Reads| Follower2
Use case: Read-heavy applications, reporting, analytics.
Multi-Leader (Active-Active)
Multiple nodes accept writes, which are then replicated to each other.
flowchart LR
Client1 -->|Writes| Leader1[(Leader 1)]
Client2 -->|Writes| Leader2[(Leader 2)]
Leader1 <-->|Bi-directional Replication| Leader2
Leader2 <-->|Bi-directional Replication| Leader3[(Leader 3)]
Leader1 <--> Leader3
Use case: Multi-region deployments, offline-first applications.
Synchronous vs Asynchronous Replication
Synchronous Replication
The leader waits for at least one follower to confirm the write before acknowledging success.
-- PostgreSQL synchronous replication configuration
-- postgresql.conf on leader
synchronous_standby_names = 'FIRST 1 (replica1, replica2)'Pros: No data loss if the leader crashes. Cons: Higher latency for writes.
Asynchronous Replication
The leader acknowledges the write immediately; replication happens in the background.
-- PostgreSQL async replication (default)
-- postgresql.conf on leader
synchronous_standby_names = '' -- Empty = asynchronousPros: Lower write latency. Cons: Small risk of data loss on leader crash.
| Feature | Synchronous | Asynchronous |
|---|---|---|
| Data loss risk | None | Small window |
| Write latency | Higher (wait for replica) | Lower (immediate ack) |
| Read consistency | Strong | Eventual |
| Network requirement | Low latency | More tolerant |
Setting Up Read Replicas
PostgreSQL Streaming Replication Setup
Step 1: Configure the leader (postgresql.conf)
wal_level = replica
max_wal_senders = 5
wal_keep_size = 1024 # in MBStep 2: Create a replication user on the leader
CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'secure_password';Step 3: Configure pg_hba.conf on the leader
host replication replicator 192.168.1.0/24 md5Step 4: Set up the follower
# Stop PostgreSQL on follower
pg_ctl stop -D /var/lib/postgresql/data
# Clone the leader's data directory
pg_basebackup -h leader-ip -D /var/lib/postgresql/data \
-U replicator -P -v -R
# Start PostgreSQL on follower
pg_ctl start -D /var/lib/postgresql/dataThe -R flag creates a standby.signal file and writes connection parameters to postgresql.auto.conf.
MySQL Replication Setup
Step 1: Configure the source (my.cnf)
[mysqld]
server-id = 1
log_bin = /var/log/mysql/mysql-bin.log
binlog_do_db = myappStep 2: Create replication user
CREATE USER 'replicator'@'%' IDENTIFIED BY 'secure_password';
GRANT REPLICATION SLAVE ON *.* TO 'replicator'@'%';Step 3: Configure the replica
[mysqld]
server-id = 2
relay-log = /var/log/mysql/mysql-relay-bin.logCHANGE MASTER TO
MASTER_HOST='source-ip',
MASTER_USER='replicator',
MASTER_PASSWORD='secure_password',
MASTER_LOG_FILE='mysql-bin.000001',
MASTER_LOG_POS=154;
START SLAVE;Failover Procedures
Failover is the process of promoting a replica to become the new leader when the current leader fails.
Manual Failover (PostgreSQL)
# On the replica to promote
pg_ctl promote -D /var/lib/postgresql/data
# Or using SQL
SELECT pg_promote();Automated Failover with Patroni
Patroni is a popular HA (high availability) tool for PostgreSQL that handles automatic failover using distributed consensus (etcd/Consul/ZooKeeper).
# patroni.yml
scope: mycluster
namespace: /db/
name: pg1
consul:
host: 127.0.0.1:8500
postgresql:
data_dir: /data/postgresql
pg_hba:
- host all all 0.0.0.0/0 md5
replication:
username: replicator
password: secure
network: 192.168.1.0/24Patroni in action:
- Leader health-checked via Consul
- If leader fails, Patroni promotes the healthiest replica
- Others automatically re-replicate from the new leader
- Old leader re-joins as a replica when it comes back
Orchestrator (MySQL)
Orchestrator provides similar functionality for MySQL:
# Orchestrator CLI commands
orchestrator-client -c discover -i db-server:3306
orchestrator-client -c topology -i db-server:3306
orchestrator-client -c graceful-master-takeover -i current-master:3306Replication Lag Detection
Replication lag measures how far behind a replica is from the leader. High lag means stale reads and increased data loss risk.
PostgreSQL Lag Detection
-- Check replication lag on the leader
SELECT
client_addr,
state,
sync_state,
pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS lag_bytes,
ROUND(EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp()))::numeric, 2) AS lag_seconds
FROM pg_stat_replication;Expected output:
client_addr | state | sync_state | lag_bytes | lag_seconds
--------------+-------+------------+-----------+-------------
192.168.1.20 | streaming | async | 24576 | 0.34
192.168.1.21 | streaming | sync | 0 | 0.02MySQL Lag Detection
SHOW SLAVE STATUS\G
-- Look for: Seconds_Behind_MasterAlert thresholds:
- Warning: lag > 10 seconds
- Critical: lag > 60 seconds
- Emergency: lag > 300 seconds
Conflict Resolution (Multi-Leader)
In multi-leader setups, two leaders might receive conflicting writes for the same row.
Common strategies:
- Last-write-wins (LWW): Timestamp-based — highest timestamp wins (but clock skew is a problem)
- CRDTs: Conflict-free replicated data types (e.g., counters, sets)
- Application-level resolution: Store both versions and let the app decide
- Custom merge functions: Database-specific conflict handlers
-- PostgreSQL logical replication conflict handlers
-- NOTIFY to trigger application-level merge on conflict
CREATE SUBSCRIPTION mysub
CONNECTION 'host=other-leader dbname=mydb'
PUBLICATION mypub
WITH (failover = true, origin = none);Monitoring Replication Health
flowchart TD
A[Replication Monitor] --> B{Health Check}
B -->|Connected, Lag OK| C[✅ Healthy]
B -->|Connected, High Lag| D[⚠️ Investigate]
B -->|Disconnected| E[🚨 Failover]
D --> F{Can fix?}
F -->|Yes| G[WAL replay, network fix]
F -->|No| E
G --> C
E --> H[Promote replica]
H --> I[Repair old leader]
I --> J[Rejoin as replica]
Monitoring Commands
# pg_stat_replication check (run every 30s via cron)
psql -c "SELECT count(*) FROM pg_stat_replication;" | grep -v count
# Patroni health check
patronictl list mycluster
# Orchestrator discovery
orchestrator-client -c which-cluster -i db:3306Common Replication Errors
1. Replication Lag Causing Stale Reads
Reading from a replica that’s behind the leader returns stale data. Critical reads should go to the leader, or use synchronous replication for strong consistency.
2. Split-Brain in Multi-Leader Setups
Two leaders both accept writes thinking the other is dead. Use a consensus mechanism (etcd, Patroni) to prevent this. Configure pg_hba.conf and firewall rules carefully.
3. Network Latency Hurting Sync Replication
Synchronous replication across continents adds 100-300ms to every write. Use async replication for cross-region and sync only within the same datacenter.
4. WAL File Accumulation on Leader
If a replica falls too far behind, the leader accumulates WAL (Write-Ahead Log) files, potentially filling the disk. Set wal_keep_size appropriately and monitor disk usage.
5. Auto-Increment Conflicts in Multi-Leader MySQL
Two MySQL leaders both allocate ID 1001, causing conflicts. Use auto_increment_increment and auto_increment_offset to partition ID ranges per node.
6. Not Testing Failover
A failover procedure that’s never tested will fail when you need it. Run monthly failover drills. Use tools like chaos-monkey to simulate failures.
7. Forgetting to Update Application Connection Strings
After failover, the application still points to the old leader. Use a connection proxy (PgBouncer, ProxySQL, HAProxy) that tracks the current leader.
Practice Questions
1. What’s the main trade-off between synchronous and asynchronous replication?
Synchronous guarantees no data loss but increases write latency. Asynchronous is faster but risks losing recent writes if the leader crashes.
2. How does Patroni automate failover?
Patroni uses distributed consensus (etcd/Consul) to monitor leader health. When the leader fails, it promotes the most up-to-date replica and reconfigures others to replicate from the new leader.
3. What causes replication lag and how do you detect it?
Network congestion, heavy writes on the leader, or under-provisioned replicas. Detect it by comparing pg_current_wal_lsn() with replay_lsn in PostgreSQL, or Seconds_Behind_Master in MySQL.
4. When would you choose multi-leader over leader-follower replication?
Multi-region deployments where users write to their closest datacenter, or offline-first applications that sync when connectivity is available.
5. Challenge: Design a replication topology for a global SaaS.
Your SaaS serves users in US, Europe, and Asia with 50M reads/day and 500K writes/day. Writes must be durable with no data loss. Reads can be slightly stale. Answer: Use leader-follower with synchronous replication within each region (for no-loss writes) and async cross-region replication. Route reads to the nearest replica. Use Patroni for automatic failover in each region. Application writes to the local leader and reads from local replicas.
FAQ
Try It Yourself
Set up a basic leader-follower replication with PostgreSQL:
- Spin up two PostgreSQL instances (Docker recommended)
- Configure the leader with WAL settings
- Set up the follower with
pg_basebackup - Create a table and insert data on the leader
- Verify data appears on the follower
- Promote the follower and verify it accepts writes
What’s Next
You’ve learned replication topologies, sync vs async tradeoffs, failover with Patroni, and monitoring strategies. Start by setting up a simple leader-follower pair with async replication, then add monitoring and test failover before moving to production.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro