Backup System
The backup system provides automated, incremental backups of all user home directories to /mnt/data/backup/user_home. It uses GNU tar’s incremental backup feature with zstd compression for efficient storage.
Components
1. Backup Script: bin/backup-user-dirs
Features:
- Incremental backups using GNU tar’s
--listed-incrementalwith.snarsnapshot files - High compression with zstd level 18
- Automatic exclusion of large, reproducible directories (conda, R packages, caches)
- Full backup every 30 days, incrementals in between
- Detailed logging to
/var/log/backup-user-dirs.log - Error handling and verification
Excluded directories:
miniconda*- Conda installations.conda,.cache/conda- Conda config and cacheR/x86_64-*- R installed packages.cache/R- R cache.local/share/renv,*/renv/library- renv package caches.cache,.local/share/Trash- General caches and trashsnap/*/*/.cache- Snap caches*.tmp,*~- Temporary files
Backup naming:
- Format:
username_YYYY-MM-DD.tar.zstd - Snapshot file:
username.snar
2. Restore Script: bin/restore-user-dir
Usage:
sudo ./restore-user-dir <username> [restore_path] [backup_date]Examples:
# Restore latest backup to original location
sudo ./restore-user-dir john
# Restore to alternate location
sudo ./restore-user-dir john /tmp/john_restore
# Restore specific date
sudo ./restore-user-dir john /home/john 2025-09-01Features:
- Automatically restores full incremental chain
- Can restore to original or alternate location
- Fixes ownership when restoring to original home
- Interactive confirmation for overwrites
3. Systemd Timer (Weekly Automation)
Files:
systemd/backup-user-dirs.service- Service unitsystemd/backup-user-dirs.timer- Timer unit
Schedule:
- Runs every Sunday at 3:00 AM
- Randomized delay up to 30 minutes to reduce load spikes
- Persistent (runs after boot if missed)
Backup Strategy
Incremental Backup Chain
The system uses a monthly full backup cycle with weekly incrementals:
Week 1: Full backup (new .snar file created)
Week 2: Incremental (only changed files since week 1)
Week 3: Incremental (only changed files since week 2)
Week 4: Incremental (only changed files since week 3)
Week 5: Full backup (reset, new .snar file)
How Incremental Backups Work
- Full Backup: When no
.snarfile exists or it’s >30 days old- Creates complete backup of all files
- Generates new
.snarsnapshot file with metadata
- Incremental Backup: When
.snarfile exists and is <30 days old- Only backs up files modified since last backup
- Updates
.snarfile with new state - Much smaller than full backups
- Restoration: Requires applying backups in sequence
- Start with full backup
- Apply each incremental in order
- The restore script handles this automatically
Storage Efficiency
- First backup: ~10-150GB per user (full, compressed)
- Weekly incrementals: ~100MB-5GB (only changes)
- Monthly cycle prevents chains from becoming too long
- Compression ratio typically 3:1 to 10:1 with zstd
Manual Operations
Run backup manually
sudo /home/burk/repos/bips-hb/ml-workstation-setup/bin/backup-user-dirsCheck backup status
# View recent logs
sudo tail -f /var/log/backup-user-dirs.log
# Check backup sizes
ls -lh /mnt/data/backup/user_home/
# Check next scheduled run
systemctl list-timers backup-user-dirs.timerTest backup integrity
# List contents without extracting
tar -tzf /mnt/data/backup/user_home/username_2025-09-06.tar.zstd | head
# Verify archive integrity
tar -tf /mnt/data/backup/user_home/username_2025-09-06.tar.zstd >/dev/null && echo "OK"Monitoring and Maintenance
Log Rotation
The log file /var/log/backup-user-dirs.log should be added to logrotate:
# /etc/logrotate.d/backup-user-dirs
/var/log/backup-user-dirs.log {
monthly
rotate 12
compress
missingok
notifempty
}Disk Space Management
- Monitor
/mnt/datausage regularly - Consider removing backups older than 6 months
- Full backups can be kept as archives
- Old incremental chains can be deleted after their full backup
Backup Verification
Periodically test restoration to ensure backups are valid:
# Test restore to temporary location
sudo ./restore-user-dir username /tmp/test_restore
# Verify contents
ls -la /tmp/test_restore/
# Clean up
sudo rm -rf /tmp/test_restoreTroubleshooting
Common Issues
- “Insufficient space” error
- Check
/mnt/datawithdf -h /mnt/data - Remove old backups or increase storage
- Check
- Backup takes too long
- Normal for first full backup (can take hours)
- Incrementals should be much faster
- Check for large new files in home directories
- Restore fails with “snapshot not found”
- May be attempting incremental restore without full backup
- Check available backups with
ls /mnt/data/backup/user_home/
- Permission errors
- Ensure running with sudo
- Check
/mnt/datamount permissions
Recovery Scenarios
User accidentally deleted files:
# Restore just their home to original location
sudo ./restore-user-dir usernameSystem migration to new hardware:
# For each user
sudo ./restore-user-dir username /home/usernameInvestigate historical state:
# Restore old version to temp location
sudo ./restore-user-dir username /tmp/username_old 2025-08-01Security Considerations
- Backups are owned by
root:emmywith 640 permissions - Only root and emmy group members can read backups
- Sensitive data in homes is preserved in backups
- Consider encryption for highly sensitive environments
Future Improvements
Potential enhancements to consider:
- Email notifications on backup failure
- Automated backup rotation/cleanup
- Backup encryption option
- Network backup destination support
- Differential backups between full backups
- Parallel compression for faster backups