Easy server backups to Amazon S3 with duplicity

We've made a recent push at Icelab to toughen and better codify our server builds. Among other things, we've sought to put in place a simple backup system that could deal with a variety of applications. With thanks to the smart folk at Plus2, we've settled on the very capable duplicity.

duplicity is an open source tool that provides "encrypted bandwidth-efficient backup using the rsync algorithm." There's plenty of goodness packed into that one-line description. The encryption keeps our data secure, and it's use of librsync means that incremental backups are built right in. Better still is that duplicity supports a number of different backup targets, including Amazon S3. Backing up to S3 is great for us because it is cheap, scalable and reliable, and most importantly, geographically and network independent from our servers.

Setting up

To get started, you can install duplicity with apt-get:

apt-get install duplicity

Then, there are a couple of things you'll need to do in order to start using it with S3. First, you'll need to generate the GnuPG key used to encrypt your backups:

gpg --gen-key

Accepting the defaults is fine here, but make sure you add a passphrase to the key, since duplicity will not work without it. Then you'll need to set some environment variables to contain this passphrase and your Amazon S3 credentials:

export PASSPHRASE=your_gpg_passphrase
export AWS_ACCESS_KEY_ID=your_s3_access_key
export AWS_SECRET_ACCESS_KEY=your_s3_secret

Later on, we'll encapsulate these in a script to run duplicity.

Backing Up

For now though, we can explore the various incantations that duplicity allows us. At the most basic level, create a backup works like this:

duplicity source_directory target_url

Your target_url can be things like scp://user@backuphost.com/some_dir, or file:///mnt/backup, duplicity supports a whole bunch of schemes. For S3, we'd use something like this:

duplicity /var/www s3+http://com.mycorp.myhost.backup

This first time you run this, it will make a full backup the entire /var/www directory to the com.mycorp.myhost.backup bucket on S3. Run the same command again and it will automatically make an incremental backup. However, you'll want to make sure you don't keep making incremental backups indefinitely, since this would ultimately make the restore process quite painful. duplicity already has you covered here, with its --full-if-older-than argument:

duplicity --full-if-older-than 30D /var/www s3+http://com.mycorp.myhost.backup

This will make sure you get a full backup every 30 days, with incremental backups in between.

If you want to backup some different directories from disparate places on the filesystem, you would use a command like this:

duplicity --include /var/www --include --include /var/dbdumps --exclude "**"  --full-if-older-than 30D / s3+http://com.mycorp.myhost.backup

The trick here is to tell duplicity to back up the whole filesystem (hence the "/"), but then ignore every file (the --exclude "**"), except for some explicitly included directories.

Restoring

Backing up so far has been easy, but the system is only as good as its ability to restore files, right? You can do that simply by flipping around the arguments to duplicity:

duplicity s3+http://com.mycorp.myhost.backup ~/restore

This pulls down the entire backup to the local restore directory. If you want to be more specific, you can specify a particular file or directory to be restored, or a restore from a particular date:

# Restore a file
duplicity --file-to-restore var/dbdumps/app.sql s3+http://com.mycorp.myhost.backup ~/restore

# Restore a directory
duplicity --file-to-restore var/dbdumps s3+http://com.mycorp.myhost.backup ~/restore

# Restore everything from a point in time
duplicity -t 2010-09-22T01:10:00 s3+http://com.mycorp.myhost.backup ~/restore

Automating

To run regular backups, you will want to enclose your favoured duplicity command along with the configuration varaibles into a script. If you have databases you want to back up, you could also run mysqldump (or whatever other command you need) to prepare the database dumps before backup:

#!/bin/bash

export PASSPHRASE=your_gpg_passphrase
export AWS_ACCESS_KEY_ID=your_s3_access_key
export AWS_SECRET_ACCESS_KEY=your_s3_secret

export MYSQL_PASSWORD=your_mysql_password

for DB in app_1 app_2 app_3; do
  mysqldump -h localhost -u backup_user -p${MYSQL_PASSWORD} ${DB} | gzip > /var/dbdumps/${DB}.sql
done

duplicity --include /var/www --include --include /var/dbdumps --exclude "**"  --full-if-older-than 30D / s3+http://com.mycorp.myhost.backup

Put this script somewhere on your system, say /usr/local/sbin/backup.sh, and make it readable only by root (to protect the plaintext passwords that it includes):

chown root:root /usr/local/sbin/backup.sh
chmod 0700 /usr/local/sbin/backup.sh

Then configure it to run daily in /etc/crontab:

30 * * * * root /usr/local/sbin/backup.sh

And you're done!

Further thoughts

Now it's your turn. Setting up and running duplicity is simple and straightforward, and storage on Amazon S3 is cheap and easy to access. There's really no reason why you shouldn't be running some kind of backup like this even on your the servers running your smallest, simplest or least-critical apps.

It's also worth mentioning that the way we've handled the password storage isn't fully secure, but it's about as convenient as we need in order for the backup script to run regularly and painlessly. Randy Sesser has some thoughts about obscuring it further through a tiny C program that might be useful. If you're after a more flexible backup script that uses duplicity, you might also like to check out John Schember's blog post.

Finally, keep in mind that the approach we've outlined here is for the backup of only select parts of our server, just those relating to our web apps. We do this because we have a set of scripts that can provision brand new identical servers in case one completely blows up. You should be sure to backup whatever other parts of the system you might need for a full recovery.