Monday, September 29, 2008

Using Duplicity on Windows under Cygwin to backup to Amazon S3

1. Install Cygwin. Since setup.exe on homepage is not updated in awhile and doesn't support -P switch grab latest from http://cygwin.com/setup/snapshots/. Run setup in UI mode to set defaults. Then install missing packages:

setup.exe -q -P python
setup.exe -q -P gnupg
setup.exe -q -P gcc
setup.exe -q -P joe #or your other favorite editor, Windows editors produce \r and cygwin bash doesn't like it
setup.exe -q -P librsync-devel
setup.exe -q -P librsync1
setup.exe -q -P wget



There is some magic way to put all packages into one line which I didn't care to master.

(start Cygwin bash shell)

2. Install GnuPGInterface Python module
(download .tar.gz, gunzip, tar -xvf, python setup.py install)

3. Install boto
(same as above)

4. Install Duplicity
(same as above)

5. Start your favorite Unix editor your installed in step 1 and hack away:

#S3 account id
export AWS_ACCESS_KEY_ID=FOOBAR123
#S3 account access key
export AWS_SECRET_ACCESS_KEY=ALOTOFBASE64STUFF
#gnupg password for symmetric encryption
export PASSPHRASE=some_password
export TMPDIR=/tmp

duplicity --full-if-older-than 6D --time-separator=. --archive-dir /home/Serge/dup /home/Serge/docs s3+http://uniqued-backet-name

export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export PASSPHRASE=


This script forces full backup every week, so if you run it daily you get daily incremental backups and full one every week. I recommend this option for important documents relatively small in size. For photos or something large it's better to do full backup rarely to save bandwidth and S3 costs. For example for 90 days change 6D to 90D.

You may notice that TMPDIR needs to be set explicitly, otherwise python gets confused and goes to Windows temp dir.

Cygwin bash doesn't like default duplicity time separator, hence the option to change it to dot.

--archive-dir is used by duplicity to keep local copy of hashes, it's optional, but makes backup faster and uses less bandwidth.

Bucket name needs to be unique across all buckets for all accounts at Amazon, so you need to come up with something creative.

You need to schedule this script to be run periodically. Cygwin cron or NT at.exe can do the job.

You may notice that I used symmetric encryption for backup files. Some people like to use public/private key combination. The problem with this approach is that you need to backup your private key. This is a non-trivial task by itself, if you don't do it right, then in the case of real catastrophic events you may end up with the S3 backup you can't decrypt. My approach is to use symmetric encryption and choose reasonably long and complex key I still can remember.