Backup System

Modified

2026-05-14

The backup system provides automated, incremental backups of all user home directories to /mnt/data/backup/user_home. It uses GNU tar’s incremental backup feature with zstd compression for efficient storage.

Components

1. Backup Script: bin/backup-user-dirs

Features:

  • Incremental backups using GNU tar’s --listed-incremental with .snar snapshot files
  • High compression with zstd level 18
  • Automatic exclusion of large, reproducible directories (conda, R packages, caches)
  • Full backup every 30 days, incrementals in between
  • Detailed logging to /var/log/backup-user-dirs.log
  • Error handling and verification

Excluded directories:

  • miniconda* - Conda installations
  • .conda, .cache/conda - Conda config and cache
  • R/x86_64-* - R installed packages
  • .cache/R - R cache
  • .local/share/renv, */renv/library - renv package caches
  • .cache, .local/share/Trash - General caches and trash
  • snap/*/*/.cache - Snap caches
  • *.tmp, *~ - Temporary files

Backup naming:

  • Format: username_YYYY-MM-DD.tar.zstd
  • Snapshot file: username.snar

2. Restore Script: bin/restore-user-dir

Usage:

sudo ./restore-user-dir <username> [restore_path] [backup_date]

Examples:

# Restore latest backup to original location
sudo ./restore-user-dir john

# Restore to alternate location
sudo ./restore-user-dir john /tmp/john_restore

# Restore specific date
sudo ./restore-user-dir john /home/john 2025-09-01

Features:

  • Automatically restores full incremental chain
  • Can restore to original or alternate location
  • Fixes ownership when restoring to original home
  • Interactive confirmation for overwrites

3. Systemd Timer (Weekly Automation)

Files:

  • systemd/backup-user-dirs.service - Service unit
  • systemd/backup-user-dirs.timer - Timer unit

Schedule:

  • Runs every Sunday at 3:00 AM
  • Randomized delay up to 30 minutes to reduce load spikes
  • Persistent (runs after boot if missed)

Backup Strategy

Incremental Backup Chain

The system uses a monthly full backup cycle with weekly incrementals:

Week 1: Full backup (new .snar file created)
Week 2: Incremental (only changed files since week 1)
Week 3: Incremental (only changed files since week 2)
Week 4: Incremental (only changed files since week 3)
Week 5: Full backup (reset, new .snar file)

How Incremental Backups Work

  1. Full Backup: When no .snar file exists or it’s >30 days old
    • Creates complete backup of all files
    • Generates new .snar snapshot file with metadata
  2. Incremental Backup: When .snar file exists and is <30 days old
    • Only backs up files modified since last backup
    • Updates .snar file with new state
    • Much smaller than full backups
  3. Restoration: Requires applying backups in sequence
    • Start with full backup
    • Apply each incremental in order
    • The restore script handles this automatically

Storage Efficiency

  • First backup: ~10-150GB per user (full, compressed)
  • Weekly incrementals: ~100MB-5GB (only changes)
  • Monthly cycle prevents chains from becoming too long
  • Compression ratio typically 3:1 to 10:1 with zstd

Manual Operations

Run backup manually

sudo /home/burk/repos/bips-hb/ml-workstation-setup/bin/backup-user-dirs

Check backup status

# View recent logs
sudo tail -f /var/log/backup-user-dirs.log

# Check backup sizes
ls -lh /mnt/data/backup/user_home/

# Check next scheduled run
systemctl list-timers backup-user-dirs.timer

Test backup integrity

# List contents without extracting
tar -tzf /mnt/data/backup/user_home/username_2025-09-06.tar.zstd | head

# Verify archive integrity
tar -tf /mnt/data/backup/user_home/username_2025-09-06.tar.zstd >/dev/null && echo "OK"

Monitoring and Maintenance

Log Rotation

The log file /var/log/backup-user-dirs.log should be added to logrotate:

# /etc/logrotate.d/backup-user-dirs
/var/log/backup-user-dirs.log {
    monthly
    rotate 12
    compress
    missingok
    notifempty
}

Disk Space Management

  • Monitor /mnt/data usage regularly
  • Consider removing backups older than 6 months
  • Full backups can be kept as archives
  • Old incremental chains can be deleted after their full backup

Backup Verification

Periodically test restoration to ensure backups are valid:

# Test restore to temporary location
sudo ./restore-user-dir username /tmp/test_restore
# Verify contents
ls -la /tmp/test_restore/
# Clean up
sudo rm -rf /tmp/test_restore

Troubleshooting

Common Issues

  1. “Insufficient space” error
    • Check /mnt/data with df -h /mnt/data
    • Remove old backups or increase storage
  2. Backup takes too long
    • Normal for first full backup (can take hours)
    • Incrementals should be much faster
    • Check for large new files in home directories
  3. Restore fails with “snapshot not found”
    • May be attempting incremental restore without full backup
    • Check available backups with ls /mnt/data/backup/user_home/
  4. Permission errors
    • Ensure running with sudo
    • Check /mnt/data mount permissions

Recovery Scenarios

User accidentally deleted files:

# Restore just their home to original location
sudo ./restore-user-dir username

System migration to new hardware:

# For each user
sudo ./restore-user-dir username /home/username

Investigate historical state:

# Restore old version to temp location
sudo ./restore-user-dir username /tmp/username_old 2025-08-01

Security Considerations

  • Backups are owned by root:emmy with 640 permissions
  • Only root and emmy group members can read backups
  • Sensitive data in homes is preserved in backups
  • Consider encryption for highly sensitive environments

Future Improvements

Potential enhancements to consider:

  • Email notifications on backup failure
  • Automated backup rotation/cleanup
  • Backup encryption option
  • Network backup destination support
  • Differential backups between full backups
  • Parallel compression for faster backups