Learn Linux: Backup Strategies — rsync, tar, dd, Automated Backup Scripts

Q: Can I use rsync over the internet?

Yes, with SSH encryption: rsync -avz -e ssh /source/ user@remote:/backup/. Use a non-standard SSH port and key-based authentication for security.

Linux Administration

Backup Strategies — rsync, tar, dd, Automated Backup Scripts

DodaTech Updated Jun 20, 2026 10 min read

Backup strategies are the safety net of production systems. This guide covers the essential Linux backup tools — rsync, tar, dd, dump/restore — and shows you how to build automated, reliable backup pipelines that protect against data loss, corruption, and disaster.

What You’ll Learn

You’ll master incremental backups with rsync, file-level archival with tar, disk cloning with dd and dump, and automated backup scripts with rotation, encryption, and remote storage. You’ll also see how DodaZIP and Durga Antivirus Pro use multi-tier backup strategies across their infrastructure.

Why Backup Strategies Matter

Data loss happens — human error (rm -rf in the wrong directory), software bugs, hardware failures, ransomware attacks. Without backups, recovery is impossible. A proper backup strategy means you can restore a single deleted file in 5 minutes or rebuild an entire server in hours. At DodaTech, all production systems follow the 3-2-1 rule: 3 copies, 2 different media, 1 offsite copy.

Learning Path

    flowchart LR
  A[Process Management] --> B[Backup Strategies<br/>You are here]
  B --> C[Security Hardening]
  C --> D[Shell Scripting]
  D --> E[Monitoring & Logging]
  style B fill:#f90,color:#fff

The 3-2-1 Backup Rule

The gold standard for data protection:

3 copies of your data (1 primary + 2 backups)
2 different media types (e.g., local disk + cloud storage)
1 copy offsite (different physical location)

rsync — Incremental File Sync

Rsync is the workhorse of Linux backups. It transfers only changed parts of files (delta encoding), supports compression, encryption, and remote synchronization.

# Basic local sync
rsync -av /source/dir/ /backup/dir/

# Remote sync (push)
rsync -avz /local/dir/ user@remote:/backup/dir/

# Remote sync (pull)
rsync -avz user@remote:/source/dir/ /local/dir/

# With progress and partial transfer
rsync -avzP /source/dir/ user@remote:/backup/dir/

# Archive mode (preserves everything)
rsync -aAXv /source/ /backup/   # Includes ACLs, xattrs

Key rsync Flags

Flag	Purpose
`-a`	Archive — recursive + preserve permissions, timestamps, owner, group
`-v`	Verbose
`-z`	Compress during transfer
`-P`	Progress + partial (resume)
`--delete`	Remove files in destination that don’t exist in source
`--exclude`	Exclude patterns
`--link-dest`	Hardlink-based incremental snapshots

Hardlink-Based Incremental Backups

This creates daily snapshots where unchanged files are hardlinked (zero extra space):

#!/bin/bash
# daily_snapshot.sh — Hardlink-based daily snapshots
BACKUP_DIR="/backups/server"
DATE=$(date +%Y%m%d)
LATEST=$(ls -1 "$BACKUP_DIR" | tail -1 2>/dev/null)

mkdir -p "$BACKUP_DIR/$DATE"

rsync -aAXv --link-dest="$BACKUP_DIR/$LATEST" \
  /source/dir/ "$BACKUP_DIR/$DATE/"

# Cleanup — keep 30 days
find "$BACKUP_DIR" -maxdepth 1 -type d -mtime +30 -exec rm -rf {} \;

Expected directory structure:

/backups/server/
├── 20260601/
├── 20260602/   → hardlinks to unchanged files from 20260601
├── 20260603/   → hardlinks to unchanged files from 20260602
└── ...

tar — File Archival and Compression

Tar creates single-file archives from directories, optionally with compression.

# Create compressed archive
tar -czf backup.tar.gz /path/to/data        # gzip
tar -cjf backup.tar.bz2 /path/to/data       # bzip2 (better compression)
tar -cJf backup.tar.xz /path/to/data        # xz (best compression)

# Extract archive
tar -xzf backup.tar.gz
tar -xzf backup.tar.gz -C /restore/path     # Extract to specific directory

# List contents
tar -tzf backup.tar.gz
tar -tzf backup.tar.gz | grep "config"      # Find files matching pattern

# Exclude patterns
tar -czf backup.tar.gz \
  --exclude="*.log" \
  --exclude="node_modules" \
  --exclude="cache" \
  /var/www/myapp

# Incremental tar
tar -czg /tmp/snapshot.file \
  -f full-backup.tar.gz /var/data           # First (full) backup

tar -czg /tmp/snapshot.file \
  -f incremental-1.tar.gz /var/data         # Incremental backup

Compression Comparison

Format	Command	Speed	Size (1GB data)	Use Case
None	`tar -cf`	Instant	1.0 GB	Fast archive
gzip	`tar -czf`	Fast	~350 MB	Daily backups
bzip2	`tar -cjf`	Medium	~280 MB	Weekly/monthly
xz	`tar -cJf`	Slow	~220 MB	Long-term archive

dd — Disk Cloning and Imaging

DD (data duplicator) performs byte-for-byte copies of disks, partitions, and files. Use it for full disk backups and forensic imaging.

# Clone a partition to an image file
sudo dd if=/dev/sda1 of=/backup/sda1.img bs=4M status=progress

# Clone a disk to another disk
sudo dd if=/dev/sda of=/dev/sdb bs=4M status=progress

# Backup MBR (first 512 bytes)
sudo dd if=/dev/sda of=/backup/mbr.bin bs=512 count=1

# Restore a partition image
sudo dd if=/backup/sda1.img of=/dev/sda1 bs=4M status=progress

# Compress dd output on the fly
sudo dd if=/dev/sda1 bs=4M | gzip > /backup/sda1.img.gz

# Create a sparse file (zeros → no disk space used)
dd if=/dev/zero of=disk.img bs=1M count=100 seek=1000

Expected dd output:

10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 45.7323 s, 235 MB/s

When to Use dd

Full disk recovery — Clone an entire failing drive
Forensic imaging — Bit-for-bit copies preserve deleted files
Live USBs — Write ISO images to USB drives: dd if=ubuntu.iso of=/dev/sdb bs=4M
Swap files — Create swap: dd if=/dev/zero of=/swapfile bs=1M count=4096

dump and restore — Filesystem-Level Backups

The dump command backs up ext2/ext3/ext4 filesystems at the filesystem level, recording file metadata and inodes.

# Full dump of /dev/sda1 to a file
sudo dump -0uf /backup/rootfs.dump /dev/sda1

# Level 1 incremental (back up files changed since level 0)
sudo dump -1uf /backup/rootfs-inc1.dump /dev/sda1

# Restore from a dump file
sudo restore -rf /backup/rootfs.dump

# Restore a specific file
sudo restore -xf /backup/rootfs.dump /path/to/file

# Interactive restore shell
sudo restore -if /backup/rootfs.dump

Dump levels: 0 (full), 1-9 (incremental). Each level backs up files changed since the last lower-numbered dump.

Automated Backup Script

Here’s a production-ready automated backup script with rotation, encryption, and remote upload:

#!/bin/bash
# autobackup.sh — Automated backup with rotation and encryption
# Requires: rsync, tar, gpg, aws CLI (or any remote storage)

BACKUP_NAME="${1:-server}"
BACKUP_DIR="/backups"
SOURCE_DIRS=(
    "/etc"
    "/var/www"
    "/home"
    "/opt/app/config"
)
EXCLUDE=(
    "--exclude=*.log"
    "--exclude=cache"
    "--exclude=tmp"
)
REMOTE_PATH="s3://my-bucket/backups/"
RETENTION_DAYS=30
GPG_RECIPIENT="backup@dodatech.com"

DATE=$(date +%Y%m%d_%H%M%S)
TIMESTAMP_FILE="$BACKUP_DIR/$BACKUP_NAME/timestamp.txt"

# Ensure backup directory exists
mkdir -p "$BACKUP_DIR/$BACKUP_NAME"

# Record backup timestamp
echo "Backup started: $(date)" > "$TIMESTAMP_FILE"

# Step 1: Create tar archive
echo "Creating archive..."
tar -czf "$BACKUP_DIR/$BACKUP_NAME/data_$DATE.tar.gz" \
    "${EXCLUDE[@]}" "${SOURCE_DIRS[@]}"

if [ $? -ne 0 ]; then
    echo "ERROR: Archive creation failed!"
    exit 1
fi

# Step 2: Encrypt the archive (optional)
if [ -n "$GPG_RECIPIENT" ]; then
    echo "Encrypting archive..."
    gpg --encrypt --recipient "$GPG_RECIPIENT" \
        --output "$BACKUP_DIR/$BACKUP_NAME/data_$DATE.tar.gz.gpg" \
        "$BACKUP_DIR/$BACKUP_NAME/data_$DATE.tar.gz"
    rm "$BACKUP_DIR/$BACKUP_NAME/data_$DATE.tar.gz"
    BACKUP_FILE="data_$DATE.tar.gz.gpg"
else
    BACKUP_FILE="data_$DATE.tar.gz"
fi

# Step 3: Create checksum
echo "Creating checksum..."
sha256sum "$BACKUP_DIR/$BACKUP_NAME/$BACKUP_FILE" > \
    "$BACKUP_DIR/$BACKUP_NAME/$BACKUP_FILE.sha256"

# Step 4: Remote copy (if configured)
if [ -n "$REMOTE_PATH" ]; then
    echo "Uploading to remote storage..."
    # Using rsync to remote
    rsync -avz "$BACKUP_DIR/$BACKUP_NAME/" "$REMOTE_PATH"

    # Or using AWS CLI
    # aws s3 sync "$BACKUP_DIR/$BACKUP_NAME/" "$REMOTE_PATH" --exclude "tmp*"
fi

# Step 5: Rotation — remove backups older than retention period
echo "Rotating old backups..."
find "$BACKUP_DIR/$BACKUP_NAME" -name "data_*.tar.gz*" -mtime +$RETENTION_DAYS -delete
find "$BACKUP_DIR/$BACKUP_NAME" -name "*.sha256" -mtime +$RETENTION_DAYS -delete

# Step 6: Verify last backup
echo "Verifying last backup..."
LATEST=$(ls -t "$BACKUP_DIR/$BACKUP_NAME"/data_*.tar.gz* 2>/dev/null | head -1)
if [ -n "$LATEST" ]; then
    sha256sum -c "${LATEST}.sha256" && echo "Verification PASSED" || echo "Verification FAILED!"
fi

echo "Backup completed: $(date)"

Expected output:

Creating archive...
Encrypting archive...
Creating checksum...
Uploading to remote storage...
Rotating old backups...
Verifying last backup...
data_20260620_100000.tar.gz.gpg: OK
Verification PASSED
Backup completed: Sat Jun 20 10:00:05 UTC 2026

Disaster Recovery Testing

A backup you never test is a backup you don’t have. Create a recovery test plan:

#!/bin/bash
# test_restore.sh — Test restoring from latest backup
# WARNING: Run in an isolated environment, not on production!

BACKUP_DIR="/backups/server"
RESTORE_DIR="/tmp/restore_test"
TEST_FILE="test_verify_$(date +%s).txt"

mkdir -p "$RESTORE_DIR"

# Find latest backup
LATEST=$(ls -t "$BACKUP_DIR"/data_*.tar.gz* 2>/dev/null | head -1)
echo "Testing restore from: $LATEST"

# Decrypt if needed
if [[ "$LATEST" == *.gpg ]]; then
    gpg --decrypt --output "$RESTORE_DIR/test_restore.tar.gz" "$LATEST"
else
    cp "$LATEST" "$RESTORE_DIR/test_restore.tar.gz"
fi

# Extract
tar -xzf "$RESTORE_DIR/test_restore.tar.gz" -C "$RESTORE_DIR"
echo "Files restored: $(find "$RESTORE_DIR" -type f | wc -l)"

# Check critical files
for file in "/etc/passwd" "/etc/ssh/sshd_config"; do
    if [ -f "$RESTORE_DIR/$file" ]; then
        echo "OK: $file present"
    else
        echo "MISSING: $file — backup may be incomplete!"
    fi
done

# Cleanup
rm -rf "$RESTORE_DIR"

Common Backup Mistakes

1. Not Testing Backups

The most common mistake. A backup job that runs for months but fails to restore is worthless. Schedule monthly restore drills.

2. One Backup Copy Only

If ransomware encrypts your live data and backup is on the same server or network share, both are lost. Always maintain an offline or air-gapped copy.

3. Ignoring Open File Handles

Tar and rsync can miss files that are actively being written. Use filesystem snapshots (LVM, ZFS) or database-aware backup tools for consistent backups.

4. No Monitoring

A backup that silently fails is worse than no backup. Always add monitoring: check exit codes, verify backup sizes, and alert on failures.

5. Infinite Retention

Keeping every backup forever consumes exponential storage. Define a retention policy: daily for 7 days, weekly for 4 weeks, monthly for 12 months.

6. Backing Up Temporary/Cache Data

Including /tmp, browser caches, or node_modules doubles backup time and storage. Exclude regeneratable data with --exclude.

7. No Encryption for Offsite Backups

Offsite backups cross network boundaries. Without encryption, anyone intercepting the data can read it. Use GPG or rsync over SSH with encryption.

Practice Questions

1. What’s the difference between rsync -av and rsync -aAXv? -a (archive) preserves permissions, timestamps, owner, group, and recurses. Adding -A preserves ACLs and -X preserves extended attributes.

2. How does –link-dest work in rsync? It creates hardlinks to files from a previous backup if they haven’t changed, saving disk space while providing a complete directory tree for each backup.

3. What backup strategy protects against ransomware? The 3-2-1 rule with an offline/immutable copy. Ransomware that encrypts your live data and mounted backups can’t touch an unmounted or air-gapped backup.

4. Why use dd instead of tar for backups? DD performs byte-for-byte copies including empty space, deleted files, and filesystem metadata. Use dd for full disk recovery and forensic imaging. Use tar for file-level archival.

5. Challenge: Design a backup strategy for a PostgreSQL database on a Linux server with the following requirements: point-in-time recovery within 15 minutes, maximum 1 hour of data loss, and 30-day retention. Answer: (1) Continuous WAL archiving with pg_basebackup for full weekly backups. (2) Rsync WAL segments to S3 every 5 minutes. (3) Daily tar of configuration files. (4) Automate with systemd timers. (5) Test recovery monthly using a staging environment.

Mini Project: Multi-Tier Backup System

Create a backup system that handles three tiers — immediate local backup, daily local archive, and weekly offsite:

#!/bin/bash
# multi_tier_backup.sh — Three-tier backup system
# Tier 1: Immediate rsync snapshot (every hour)
# Tier 2: Daily compressed archive (every day)
# Tier 3: Weekly encrypted offsite (every week)

SOURCE="/var/www/myapp"
BASE="/backups"
DATE=$(date +%Y%m%d)
DAY_OF_WEEK=$(date +%u)  # 1=Monday, 7=Sunday

# Tier 1 — Hourly snapshot (keep 24)
HOUR_DIR="$BASE/tier1/$DATE/$(date +%H)"
mkdir -p "$HOUR_DIR"
rsync -a --delete "$SOURCE/" "$HOUR_DIR/"

# Tier 2 — Daily archive (keep 30)
if [ ! -f "$BASE/tier2/$DATE.tar.gz" ]; then
    tar -czf "$BASE/tier2/$DATE.tar.gz" "$SOURCE"
fi

# Tier 3 — Weekly offsite (keep 12 weeks)
if [ "$DAY_OF_WEEK" = "7" ]; then
    WEEK_NUM=$(date +%V)
    tar -czf "$BASE/tier3/week_$WEEK_NUM.tar.gz" "$SOURCE"
    gpg --encrypt --recipient backup@dodatech.com \
        --output "$BASE/tier3/week_$WEEK_NUM.tar.gz.gpg" \
        "$BASE/tier3/week_$WEEK_NUM.tar.gz"
    rm "$BASE/tier3/week_$WEEK_NUM.tar.gz"

    # Upload to offsite
    rsync -avz "$BASE/tier3/week_$WEEK_NUM.tar.gz.gpg" \
        offsite-backup:/backups/myapp/
fi

# Cleanup old backups
find "$BASE/tier1" -maxdepth 2 -type d -mtime +1 -exec rm -rf {} \;
find "$BASE/tier2" -name "*.tar.gz" -mtime +30 -delete
find "$BASE/tier3" -name "*.tar.gz.gpg" -mtime +84 -delete  # 12 weeks

Run via cron: */60 * * * * /usr/local/bin/multi_tier_backup.sh

FAQ

What’s the difference between incremental and differential backups?

Incremental backs up everything changed since the last backup (of any type). Differential backs up everything changed since the last full backup. Differential takes more space but requires only the full + latest differential to restore.

How do I backup a running database?

Use database-specific tools: pg_dump for PostgreSQL, mysqldump for MySQL, mongodump for MongoDB. These create consistent snapshots without locking the database. For zero-downtime, use replication with a read replica as backup source.

Should I compress backups?

Yes — compression reduces storage 3-5x. But balance compression level against CPU/time. Fast compression (gzip -1) is better for frequent backups. XZ is best for long-term archival.

How do I verify backup integrity?

Use checksums (sha256sum) and test restores. Automate both in your backup script. A backup that can’t be restored is just a waste of storage.

What’s the best backup frequency?

Depends on your RPO (Recovery Point Objective). If you can lose 1 hour of data, backup hourly. If 24 hours, daily is fine. Critical databases may need continuous archiving.

Can I use rsync over the internet?

Yes, with SSH encryption: rsync -avz -e ssh /source/ user@remote:/backup/. Use a non-standard SSH port and key-based authentication for security.

What’s Next

Security Hardening

Shell Scripting Guide

Monitoring & Logging

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro. Updated 2026-06-20.

Previous Process Management in Linux — ps, top, htop, kill, nice, cgroups, systemd Units Next Security Hardening — SSH Config, Firewall, fail2ban, SELinux, Auditing

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Linux Administration