Skip to content
Database Replication: Setup and Management Guide

Database Replication: Setup and Management Guide

DodaTech Updated Jun 20, 2026 8 min read

Database replication is the process of copying data from one database server to another to improve read scalability, provide high availability, and enable disaster recovery — using topologies like leader-follower, multi-leader, and synchronous or asynchronous replication.

What You’ll Learn

You’ll understand leader-follower and multi-leader topologies, synchronous vs asynchronous tradeoffs, read replica configuration, failover procedures, replication lag detection, conflict resolution strategies, and monitoring with tools like Patroni and Orchestrator.

Why Replication Matters

A single database server is a single point of failure. If it goes down, your entire application goes down. Doda Browser serves thousands of users who expect 24/7 availability. Replication ensures that if one database fails, another takes over instantly, and read-heavy workloads are distributed across multiple servers for better performance.

Database Replication Learning Path

    flowchart LR
  A[Database Design] --> B[Database Indexing]
  B --> C[Database Migrations]
  C --> D[Database Replication]
  D --> E[Backup and Restore]
  D:::current
  classDef current fill:#f90,color:#fff,stroke:#333,stroke-width:2px
  
Prerequisites: Understanding of basic database concepts, PostgreSQL or MySQL administration, and command-line experience with Linux.

Replication Topologies

Leader-Follower (Primary-Replica)

The most common topology. One leader handles writes; followers replicate data for reads.

    flowchart TD
    Client -->|Writes| Leader[(Leader)]
    Leader -->|Async/Sync Replication| Follower1[(Follower 1)]
    Leader -->|Async/Sync Replication| Follower2[(Follower 2)]
    Leader -->|Async/Sync Replication| Follower3[(Follower 3)]
    Client -->|Reads| Follower1
    Client -->|Reads| Follower2
  

Use case: Read-heavy applications, reporting, analytics.

Multi-Leader (Active-Active)

Multiple nodes accept writes, which are then replicated to each other.

    flowchart LR
    Client1 -->|Writes| Leader1[(Leader 1)]
    Client2 -->|Writes| Leader2[(Leader 2)]
    Leader1 <-->|Bi-directional Replication| Leader2
    Leader2 <-->|Bi-directional Replication| Leader3[(Leader 3)]
    Leader1 <--> Leader3
  

Use case: Multi-region deployments, offline-first applications.

Synchronous vs Asynchronous Replication

Synchronous Replication

The leader waits for at least one follower to confirm the write before acknowledging success.

-- PostgreSQL synchronous replication configuration
-- postgresql.conf on leader
synchronous_standby_names = 'FIRST 1 (replica1, replica2)'

Pros: No data loss if the leader crashes. Cons: Higher latency for writes.

Asynchronous Replication

The leader acknowledges the write immediately; replication happens in the background.

-- PostgreSQL async replication (default)
-- postgresql.conf on leader
synchronous_standby_names = ''  -- Empty = asynchronous

Pros: Lower write latency. Cons: Small risk of data loss on leader crash.

FeatureSynchronousAsynchronous
Data loss riskNoneSmall window
Write latencyHigher (wait for replica)Lower (immediate ack)
Read consistencyStrongEventual
Network requirementLow latencyMore tolerant

Setting Up Read Replicas

PostgreSQL Streaming Replication Setup

Step 1: Configure the leader (postgresql.conf)

wal_level = replica
max_wal_senders = 5
wal_keep_size = 1024    # in MB

Step 2: Create a replication user on the leader

CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'secure_password';

Step 3: Configure pg_hba.conf on the leader

host replication replicator 192.168.1.0/24 md5

Step 4: Set up the follower

# Stop PostgreSQL on follower
pg_ctl stop -D /var/lib/postgresql/data

# Clone the leader's data directory
pg_basebackup -h leader-ip -D /var/lib/postgresql/data \
  -U replicator -P -v -R

# Start PostgreSQL on follower
pg_ctl start -D /var/lib/postgresql/data

The -R flag creates a standby.signal file and writes connection parameters to postgresql.auto.conf.

MySQL Replication Setup

Step 1: Configure the source (my.cnf)

[mysqld]
server-id = 1
log_bin = /var/log/mysql/mysql-bin.log
binlog_do_db = myapp

Step 2: Create replication user

CREATE USER 'replicator'@'%' IDENTIFIED BY 'secure_password';
GRANT REPLICATION SLAVE ON *.* TO 'replicator'@'%';

Step 3: Configure the replica

[mysqld]
server-id = 2
relay-log = /var/log/mysql/mysql-relay-bin.log
CHANGE MASTER TO
  MASTER_HOST='source-ip',
  MASTER_USER='replicator',
  MASTER_PASSWORD='secure_password',
  MASTER_LOG_FILE='mysql-bin.000001',
  MASTER_LOG_POS=154;
START SLAVE;

Failover Procedures

Failover is the process of promoting a replica to become the new leader when the current leader fails.

Manual Failover (PostgreSQL)

# On the replica to promote
pg_ctl promote -D /var/lib/postgresql/data

# Or using SQL
SELECT pg_promote();

Automated Failover with Patroni

Patroni is a popular HA (high availability) tool for PostgreSQL that handles automatic failover using distributed consensus (etcd/Consul/ZooKeeper).

# patroni.yml
scope: mycluster
namespace: /db/
name: pg1

consul:
  host: 127.0.0.1:8500

postgresql:
  data_dir: /data/postgresql
  pg_hba:
    - host all all 0.0.0.0/0 md5
  replication:
    username: replicator
    password: secure
    network: 192.168.1.0/24

Patroni in action:

  1. Leader health-checked via Consul
  2. If leader fails, Patroni promotes the healthiest replica
  3. Others automatically re-replicate from the new leader
  4. Old leader re-joins as a replica when it comes back

Orchestrator (MySQL)

Orchestrator provides similar functionality for MySQL:

# Orchestrator CLI commands
orchestrator-client -c discover -i db-server:3306
orchestrator-client -c topology -i db-server:3306
orchestrator-client -c graceful-master-takeover -i current-master:3306

Replication Lag Detection

Replication lag measures how far behind a replica is from the leader. High lag means stale reads and increased data loss risk.

PostgreSQL Lag Detection

-- Check replication lag on the leader
SELECT
  client_addr,
  state,
  sync_state,
  pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS lag_bytes,
  ROUND(EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp()))::numeric, 2) AS lag_seconds
FROM pg_stat_replication;

Expected output:

 client_addr  | state | sync_state | lag_bytes | lag_seconds
--------------+-------+------------+-----------+-------------
 192.168.1.20 | streaming | async   |     24576 |        0.34
 192.168.1.21 | streaming | sync    |         0 |        0.02

MySQL Lag Detection

SHOW SLAVE STATUS\G
-- Look for: Seconds_Behind_Master

Alert thresholds:

  • Warning: lag > 10 seconds
  • Critical: lag > 60 seconds
  • Emergency: lag > 300 seconds

Conflict Resolution (Multi-Leader)

In multi-leader setups, two leaders might receive conflicting writes for the same row.

Common strategies:

  • Last-write-wins (LWW): Timestamp-based — highest timestamp wins (but clock skew is a problem)
  • CRDTs: Conflict-free replicated data types (e.g., counters, sets)
  • Application-level resolution: Store both versions and let the app decide
  • Custom merge functions: Database-specific conflict handlers
-- PostgreSQL logical replication conflict handlers
-- NOTIFY to trigger application-level merge on conflict
CREATE SUBSCRIPTION mysub
CONNECTION 'host=other-leader dbname=mydb'
PUBLICATION mypub
WITH (failover = true, origin = none);

Monitoring Replication Health

    flowchart TD
    A[Replication Monitor] --> B{Health Check}
    B -->|Connected, Lag OK| C[✅ Healthy]
    B -->|Connected, High Lag| D[⚠️ Investigate]
    B -->|Disconnected| E[🚨 Failover]
    D --> F{Can fix?}
    F -->|Yes| G[WAL replay, network fix]
    F -->|No| E
    G --> C
    E --> H[Promote replica]
    H --> I[Repair old leader]
    I --> J[Rejoin as replica]
  

Monitoring Commands

# pg_stat_replication check (run every 30s via cron)
psql -c "SELECT count(*) FROM pg_stat_replication;" | grep -v count

# Patroni health check
patronictl list mycluster

# Orchestrator discovery
orchestrator-client -c which-cluster -i db:3306

Common Replication Errors

1. Replication Lag Causing Stale Reads

Reading from a replica that’s behind the leader returns stale data. Critical reads should go to the leader, or use synchronous replication for strong consistency.

2. Split-Brain in Multi-Leader Setups

Two leaders both accept writes thinking the other is dead. Use a consensus mechanism (etcd, Patroni) to prevent this. Configure pg_hba.conf and firewall rules carefully.

3. Network Latency Hurting Sync Replication

Synchronous replication across continents adds 100-300ms to every write. Use async replication for cross-region and sync only within the same datacenter.

4. WAL File Accumulation on Leader

If a replica falls too far behind, the leader accumulates WAL (Write-Ahead Log) files, potentially filling the disk. Set wal_keep_size appropriately and monitor disk usage.

5. Auto-Increment Conflicts in Multi-Leader MySQL

Two MySQL leaders both allocate ID 1001, causing conflicts. Use auto_increment_increment and auto_increment_offset to partition ID ranges per node.

6. Not Testing Failover

A failover procedure that’s never tested will fail when you need it. Run monthly failover drills. Use tools like chaos-monkey to simulate failures.

7. Forgetting to Update Application Connection Strings

After failover, the application still points to the old leader. Use a connection proxy (PgBouncer, ProxySQL, HAProxy) that tracks the current leader.

Practice Questions

1. What’s the main trade-off between synchronous and asynchronous replication?

Synchronous guarantees no data loss but increases write latency. Asynchronous is faster but risks losing recent writes if the leader crashes.

2. How does Patroni automate failover?

Patroni uses distributed consensus (etcd/Consul) to monitor leader health. When the leader fails, it promotes the most up-to-date replica and reconfigures others to replicate from the new leader.

3. What causes replication lag and how do you detect it?

Network congestion, heavy writes on the leader, or under-provisioned replicas. Detect it by comparing pg_current_wal_lsn() with replay_lsn in PostgreSQL, or Seconds_Behind_Master in MySQL.

4. When would you choose multi-leader over leader-follower replication?

Multi-region deployments where users write to their closest datacenter, or offline-first applications that sync when connectivity is available.

5. Challenge: Design a replication topology for a global SaaS.

Your SaaS serves users in US, Europe, and Asia with 50M reads/day and 500K writes/day. Writes must be durable with no data loss. Reads can be slightly stale. Answer: Use leader-follower with synchronous replication within each region (for no-loss writes) and async cross-region replication. Route reads to the nearest replica. Use Patroni for automatic failover in each region. Application writes to the local leader and reads from local replicas.

FAQ

What’s the difference between streaming and logical replication?
Streaming replication (physical) replicates the exact byte-level changes — identical copy of the database. Logical replication replicates row-level changes and can filter tables, replicate between different PostgreSQL versions, or even to non-PostgreSQL targets.
Can I use replication for backup?
Replication provides high availability but isn’t a backup. A DROP TABLE command replicates instantly to all replicas. Always maintain independent backups (pg_dump, WAL archiving) alongside replication.
How many replicas should I have?
At least 2 for production: one for failover, one for reporting/analytics. More replicas = more read capacity but more replication lag and management overhead.
Does replication work across different database versions?
Physical replication requires matching major versions. Logical replication (PostgreSQL 10+) works across different versions. MySQL replication works across minor versions but major version upgrades may require specific configurations.

Try It Yourself

Set up a basic leader-follower replication with PostgreSQL:

  1. Spin up two PostgreSQL instances (Docker recommended)
  2. Configure the leader with WAL settings
  3. Set up the follower with pg_basebackup
  4. Create a table and insert data on the leader
  5. Verify data appears on the follower
  6. Promote the follower and verify it accepts writes

What’s Next

You’ve learned replication topologies, sync vs async tradeoffs, failover with Patroni, and monitoring strategies. Start by setting up a simple leader-follower pair with async replication, then add monitoring and test failover before moving to production.

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro