Monday, September 29, 2008

Using Duplicity on Windows under Cygwin to backup to Amazon S3

1. Install Cygwin. Since setup.exe on homepage is not updated in awhile and doesn't support -P switch grab latest from http://cygwin.com/setup/snapshots/. Run setup in UI mode to set defaults. Then install missing packages:

setup.exe -q -P python
setup.exe -q -P gnupg
setup.exe -q -P gcc
setup.exe -q -P joe #or your other favorite editor, Windows editors produce \r and cygwin bash doesn't like it
setup.exe -q -P librsync-devel
setup.exe -q -P librsync1
setup.exe -q -P wget



There is some magic way to put all packages into one line which I didn't care to master.

(start Cygwin bash shell)

2. Install GnuPGInterface Python module
(download .tar.gz, gunzip, tar -xvf, python setup.py install)

3. Install boto
(same as above)

4. Install Duplicity
(same as above)

5. Start your favorite Unix editor your installed in step 1 and hack away:

#S3 account id
export AWS_ACCESS_KEY_ID=FOOBAR123
#S3 account access key
export AWS_SECRET_ACCESS_KEY=ALOTOFBASE64STUFF
#gnupg password for symmetric encryption
export PASSPHRASE=some_password
export TMPDIR=/tmp

duplicity --full-if-older-than 6D --time-separator=. --archive-dir /home/Serge/dup /home/Serge/docs s3+http://uniqued-backet-name

export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export PASSPHRASE=


This script forces full backup every week, so if you run it daily you get daily incremental backups and full one every week. I recommend this option for important documents relatively small in size. For photos or something large it's better to do full backup rarely to save bandwidth and S3 costs. For example for 90 days change 6D to 90D.

You may notice that TMPDIR needs to be set explicitly, otherwise python gets confused and goes to Windows temp dir.

Cygwin bash doesn't like default duplicity time separator, hence the option to change it to dot.

--archive-dir is used by duplicity to keep local copy of hashes, it's optional, but makes backup faster and uses less bandwidth.

Bucket name needs to be unique across all buckets for all accounts at Amazon, so you need to come up with something creative.

You need to schedule this script to be run periodically. Cygwin cron or NT at.exe can do the job.

You may notice that I used symmetric encryption for backup files. Some people like to use public/private key combination. The problem with this approach is that you need to backup your private key. This is a non-trivial task by itself, if you don't do it right, then in the case of real catastrophic events you may end up with the S3 backup you can't decrypt. My approach is to use symmetric encryption and choose reasonably long and complex key I still can remember.

7 comments:

Anonymous said...

Great post Serge. It helped tremendously to get me going. The Cygwin command line method to install multiple packages is to separate them with a comma. So you could reduce #1 down to:

setup.exe -q -P python,gnupg,gcc,librsync-devel,librsync1,wget,joe

That should shave a couple minutes off the setup time.

oei said...

Thanks for a very comprehensive overview. I've been meaning to look into this for ages - now I can just use your notes! :)

Julian said...

Or you could save yourselves loads of time and effort by picking up Duplicati by Kenneth Skovhede! It is a Windows port of Duplicity with some nice features like a GUI. Not quite fully backwards compatible with Duplicity but it runs on Linux too using Mono.

--
Regards,
Julian Knight, http://it.knightnet.org.uk

Mark said...

Thanks for the tips - was able to install correctly, but not able to back up data successfully - I get errors in cygwin about issues with the number of open files that can be open. Have you not encountered this? (got this for both duplicity 0.6.02 and 0.6.18).

"Max open files of -1 is too low, should be >= 1024.
Use 'ulimit -n 1024' or higher to correct.
"

Serge Khorun said...

Mark, I haven't updated my cygwin and duplicity in awhile, so I don't see this issue (yet).

SanskritFritz said...

Couple of observations:
- http://cygwin.com/setup/snapshots/ doesnt exist anymore.
- ulimit -n 1024 is needed
- duplicity fails with the following message:
GPGError: GPG Failed, see log below:
===== Begin GnuPG log =====
===== End GnuPG log =====
I have read somewhere else (http://jager.no/projects/windows/duplicity-on-windows), that gpg.py needs to be patched, but that patch is dated, wont work with newer versions of duplicity.

Anonymous said...

If you do not want all the Cygwin stuff, you might want to try out Duplicati. Duplicati started as Windows clone for duplicity but went its own way later on. Nonetheless, it creates encrpted, incremental backups and uploads them to remote file servers like Amazon S3, FTP, SSH, WebDAV and others.

You find Duplicati at http://www.duplicati.com/