Automatic Cloud Backups: How to copy your Amazon EC2 instance snapshots into S3 bit buckets.

Now with cloud computing you can delete entire computers with a touch of a button! So what if it happens by accident? No problem, you have a backup… right? Here is a script you may find useful for when you want to set instances up to automatically be backed up by a central server (or you can have them back themselves up individually, but don’t copy your AWS credentials to somewhere unsafe!)

Data Integrity Prerequisites and Other Considerations

You probably should lock your DBMS first, to avoid data corruption. If certain files are moved around while users are writing to the MYSQL or Oracle or some other database… well let’s just say it’s possible that the DBMS could get confused. The script will attempt to do this if you are using MYSQL, but you will need your mysql account information. Remember to back things up in a way you can restore them. A backup is only as good as one’s ability to restore it. Remember to store this script and all your credentials in a safe place. (chmod 700 is probably also a good idea). This is intended for backing up small instances, but you might use EBS for persistant storage, take EBS snapshots, and make copies if your instance is larger than 10 GB.

Account Credentials Needed

You might need to access your Account credentials page via aws.amazon.com. You’ll need the pem keys and the X.509 certs, etc.

Once you have all that, just edit and put this script into your cron.daily or schedule it for a good time [:

#!/usr/bin/python 
# Asher Bond 2010
# Backup your EC2 instances to S3 every day
# an enhanced script based on backup.py from Paul Kenjora of Aware Labs
# http://blog.awarelabs.com/2009/painless-amazon-ec2-backup/

# apt-get install python-mysqldb if you don't have this, but
# it's not needed if you don't have MYSQL or a DBMS.
import MySQLdb

# you will need this to make system calls
import os

from datetime import date


# get this from aws.amazon.com (under security credentials you can find real credentials to replace the examples)
pem_file = '/root/.ssh/pk-APKAI6XFAKE1RVFIDNDA.pem'
cert_file = '/root/.ssh/cert-1J7DNJST6ET0YERMOMSREAL897B26UMS.pem'
user_id = '163265130428'
platform = 'i386'
bucket = 'backups.asherbond.com'

access_key = 'AKIAJ4QMQYJ5MXUTA4GA'
secret_key = '1WQjBA8i45CW3mnzzB/59jAFTuPl32d6u1YDgHvF'
ec2_path = '/etc/ec2/bin/' #use trailing slash

# it's a good idea to lock your DBMS so that people aren't writing to your DBMS files while they snapshot is taken.
# again, you can comment this whole database section out if you don't have a DBMS on the instance you're imaging.
conn = MySQLdb.connect (host = "localhost",
                        user = "backup",
                        passwd = "ordie",
                        db = "mysql")

cursor = conn.cursor ()
cursor.execute ("FLUSH TABLES WITH READ LOCK")
cursor.close ()


# 3 steps to back up the files to S3
days = ('monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday', 'sunday')
manifest = days[date.today().weekday()]

# remove some files
step_1 = 'nice rm -f /mnt/%s*' % (manifest,)

# bundle whatever's here into an image
step_2 = 'nice %sec2-bundle-vol -p %s -d /mnt -k %s -c %s -u %s -r %s' % (ec2_path, manifest, pem_file, cert_file, user_id, platform)

# unlock the DBMS
cursor = conn.cursor ()
cursor.execute ("UNLOCK TABLES")
cursor.close ()

# close this script's connection to the dbms
conn.close ()


# upload the bundle to s3
step_3 = 'nice %sec2-upload-bundle -b %s -m /mnt/%s.manifest.xml -a %s -s %s' % (ec2_path, bucket, manifest, access_key, secret_key)

print step_1
os.system(step_1)
print step_2
os.system(step_2)
print step_3
os.system(step_3)

You should expect the process to take about 30 min on an m1.small, but t1.mini or large instances may back up faster (unless of course they have more stuff on disk, lol.)

Here is what my output looks like:

asherbond:~$ /etc/cron.daily/backup.py
Copying / into the image file /mnt/thursday…
Excluding:
/sys
/proc/bus/usb
/proc
/dev/pts
/dev
/media
/mnt
/proc
/sys
/mnt/thursday
/mnt/img-mnt
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.001796 s, 584 MB/s
mke2fs 1.41.3 (12-Oct-2008)
cBundling image file…
Splitting /mnt/thursday.tar.gz.enc…
Created thursday.part.000
Created thursday.part.001
Created thursday.part.002
Created thursday.part.003
Created thursday.part.004
…. etc ….
Created thursday.part.150
Generating digests for each part…
Digests generated.
Creating bundle manifest…
ec2-bundle-vol complete.
Uploading bundled image parts to the S3 bucket backups.asherbond.com …
Uploaded thursday.part.000
Uploaded thursday.part.001
Uploaded thursday.part.002
Uploaded thursday.part.003
…. etc ….
Uploaded thursday.part.150
Uploading manifest …
Uploaded manifest.
Bundle upload completed.

Be sure to log into your S3 bit bucket to make sure the backups are there. You should probably also restore each backup to a test instance and see if things look right.

Tagged with: , , , , , , , ,
Posted in cloud computing, designing scalable systems
8 comments on “Automatic Cloud Backups: How to copy your Amazon EC2 instance snapshots into S3 bit buckets.
  1. linux backup says:

    OS Backups do not need to be a chore. A nice backup plan will go for ages.

  2. Ashleigh Milne says:

    Thanks for this post and the link to Archie Hendryx’s blog, I must say there’s some fantastic articles on the Cloud there. Cloud definitely needs simplifying.

    • Asher Bond says:

      Thanks for reading. I’m glad you were able to enjoy the article. Service-orientation is inevitable when Technology is applied. It means we end up providing service to people who don’t understand, care, or have time for how this stuff works.

  3. Buck Helson says:

    Personally, I prefer local storage; I like being 100% in control of my data. Backing up data in the cloud is pretty risky (possible hackers and other third parties having access to it) I backup my data monthly, to a flash drive as well as an external hard drive.

    • Asher Bond says:

      That makes sense for keeping your data more private, but I will say that public cloud providers like AWS and Rackspace really work hard at maintaining security. Keep in mind at on a larger scale we are helping companies do this inside their datacenter. This is often a compliance requirement. In many cases for consumers who can’t keep their own data safe, a cloud drive may be more secure. My advice is to make as many backups as you can efficiently and cost effectively store and restore… Some of my backups are in the private cloud and some are in the public cloud… and some are even on jump drives and some are even on tapes and some are even on discs.

    • Asher Bond says:

      Buck, we serve clients who have no-third-party data persistence requirements for compliance reasons. We sometimes build them their own cloud on premise. Cloud doesn’t need to mean outsourcing if you are using the technology as a way to automate and leverage infrastructure. Often a storage area network or network attached storage appliance can help with backups. Zmanda, for example, will allow you to persist your SQL backups in the public or private cloud depending how it is configured. Working on the McDevOps project has led to similar requirements. We have decided to include on or off premise backup options in all future releases.

  4. Jeffrey Smith says:

    Hi Asher,

    How would I restore from this backup?

    Thanks.

  5. Alan says:

    Hi Jeffrey – I think you should be able to register it under AMI’s. Then use spot request.