Home > Big Data and Hadoop > Backup to disk on Amazon Glacier or other cloud storage

Backup to disk on Amazon Glacier or other cloud storage

As an enterprise CIO you may have a business requirement to keep enterprise backups on-site for 30 days but retain backups off-site for 180 days.   A few years ago your only choice was to backup to disk for the short term and backup to tape in the longer term and then contract with some tape vaulting services like Iron Mountain to truck your tapes off-site for safekeeping.

Now that cloud providers like Amazon are offering Glacier as a low cost disk based archive for backups you have options to get rid of tape completely if you so choose.    Recent announcements like “Panzura supports Amazon Glacier” are worth noting in this regard.

It is a good first step that Amazon offers Glacier for storing data long term at a penny per GB but how do you backup your in-house services to the Amazon Glacier repository?  We’ll try to answer this in stages.

penguin on a glacier

Amazon AWS offers you EC2 instances which are just Xen virtual “machines-on-demand” possibly running on an AMD x86 server.  The storage associated with this EC2 instance is non-persistent (literature majors would say “ephemeral” or “transient”).  What this means is that when you power down the EC2 instance you lose the data.   Unlike EC2, Amazon S3 can provide persistent storage for data from EC2 instances.

Glacier is Amazon’s archive service for data that doesn’t need to be retrieved in undue haste.  If you are in a hurry to retrieve your data leave it on S3, if you have time to stop and smell the roses (for a few hours mind you!)  while your data is being retrieved then use Glacier.

Getting back to “Panzura supports Amazon Glacier”, this is how I believe it would work:   If you are a Symantec shop, you’ve invested (perhaps too heavily in hindsight) in products like Symantec NetBackup to backup data from your servers.  You decide to archive these backups in the cloud.  To do this you need a way to dedupe, compress and possibly encrypt your backup data before it leaves the sanctity of your data center.  To do this you deploy a cloud gateway (also called a cloud controller) which is an appliance made by a vendor like Panzura (or Quantum for that matter).    The Panzura appliance happens to be a 1U (the U refers to a rack unit) or 2U appliance containing SSD drives acting as a read/write cache so you don’t notice the latencies introduced by bringing a public cloud into your data’s workflow.  Next you have the knotty question of “If I save backups to the cloud can I recycle the oldest backup so I don’t pay the cloud vendor fees to retain more than a finite set of backups?”  The answer is yes, provided you use a protocol like NFS/CIFS to access your Panzura cloud controller.

If you don’t want to place the controller appliance in your data center you have the option of buying an Amazon Machine Image (AMI) version of the cloud controller which will run on an EC2 instance.  Since EC2 instances provide only transient and not persistent storage, you decide to store your backups on S3.  Remembering that you are on a tight IT budget you decide to tier the backup sets from S3 to Glacier whose $0.01 per GB/month ($10 per TB/month) price doesn’t keep you awake at night.

Now backup is only 50% of the equation.  How do you restore the data to your application servers in the event of a crash?  The fact that the Panzura controller is implemented as an AMI means that you can restore from your backup sets archived in Glacier (or in S3) to your in-premise application servers.

Do you have a solution if you use CommVault Simpana instead of Symantec NetBackup?  Yes, in this case you’d just use the cloud storage connector in Simpana to backup to the cloud of your choice (Amazon, Nirvanix, Iron Mountain).

Now are we restricting this solution only to enterprise backup?  No, you can also have archive SharePoint data via the cloud storage controller to Amazon’s Glacier.

Are there TCO studies that evaluate whether archiving on Amazon Glacier is really more cost-efficient than archiving to tape? Curtis Preston’s blog article says yes and provides an analysis.

Is Glacier the next best thing to sliced bread?  Some would say no.  Are there alternatives to Amazon Glacier?  Yes, you could just as well use in-house cloud controllers to send data to another vendor’s cloud like HP Cloud Services built on OpenStack™ technology or to Quantum’s Q-Cloud.  If you decide to go the Q-Cloud route, you’d end up using a Quantum DXi deduplicating appliance in your data center and have NetBackup treat this appliance as a backup disk target.  This Quantum DXi appliance will then replicate the backups to a remote DXi appliance in the Quantum Q-Cloud using replication software like Symantec Optimized Duplication.

In conclusion, if you want the security of having an offsite backup but don’t want to be bothered using tape and tape archive services, you should consider Glacier or other disk based archiving services in the cloud.

  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: