Using Ruby with Amazon Web Services, an example

I’ve been recently extolling the virtues of Ruby on the Lopsa IRC channel so
I thought it would be fun to take a real world problem and write a
Ruby-based solution. This particular problem has to do with
manipulating snapshots in Amazon’s EC2. Those who administer EC2
instances know (or should know) that the storage associated with an
instance does not outlast the instance. If you want persistent storage
you have to create a volume of elastic block store. Not only will it
outlast the instance, it can be moved between them and it can be
backed up into something called a snapshot.

On most of our instances we have a cron job that creates a snapshot of
the elastic block store data volume every night. Over time we tend to
accumulate a very large number of snapshots. So I wanted to write
a simple script that would
purge snapshots older than a certain number of days. I could have
written this in shell using the output of the command
ec2-describe-snapshots, but using Ruby to do this is much more fun.

Glenn Rempe has written a ruby library that provides all the EC2 API
calls in an easy to use object. This is available as a gem
(the ruby equivalent of a CPAN module). If you have a ruby environment
with gem installed then the EC2 gem is easy to retrieve and install.
I had some trouble getting the gem to work, and I will relate my struggles
in a separate blog entry.

All of the EC2 API calls are available as methods in an EC2
object. Create the object (with your credentials) and you can use it
to make any of the calls. Since all these calls return XML, the EC2
object does something interesting and clever with the responses. It
creates an object with methods that correspond to the XML tags in the
response. Thus, following the documentation for a particular call, you
know exactly how to get to the data you need.

Since we will be calling DescribeSnapshots let’s look at the XML
we get back for that request. Assume an example call like this:

snapshots = ec2.describe_snapshots(:owner => 'self')

This sends the request DescribeSnapshots to the EC2 server
associated with the object ec2 and specifies that we are only
interested in snapshots owned by our account. The response to this
request might look like this:

<DescribeSnapshotsResponse xmlns="http://ec2.amazonaws.com/doc/2010-06-15/">
  <snapshotSet>
    <item>
      <snapshotId>snap-78a54011</snapshotId>
      <volumeId>vol-4d826724</volumeId>
      <status>completed</status>
      <startTime>2010-07-31T12:51:50.000Z</startTime>
      <progress>100%</progress>
      <ownerId>218213537122</ownerId>
      <volumeSize>40</volumeSize>
    </item>
  </snapshotSet>
</DescribeSnapshotsResponse>

This information is stored in the object snapshots. We can dig down
in to the response using accessors named after the XML tags in the response.
So snapshots.snapshotSet returns the set and
snapshots.snapshotSet.item returns an array of
the items in the set. Each item has methods to access snapshotId,
volumeId, startTime, and so on. These accessor methods all
return strings, so if we need the data in another form we will have to
convert the value.

So let’s put the whole thing together and walk through a sample program
that will obtain a list of snapshots we own, determine which ones need to
be removed, and remove them. If you’ve never seen ruby code before then
here are a few things to look for. Every value is an object and the methods
for an object can be invoked by appending a dot and the name of the method
to the object’s name. Thus obj.select invokes the select method
for object obj. Methods accept arguments of any class. A hash can be
passed to a method using a notation like :keyname => value. The colon
is used to quote a literal. A block of code can also be passed to a method.
The block is enclosed in curly braces and can start with parameters of its
own. The block’s parameter list is enclosed in vertical bars, for example
|a,b|. The named parameters are then used in the body of the block,
for example {|a| a < 10}. Finally, the operator <=>
is a comparison that returns -1, 0, or 1 if its lefthand operand is less than,
equal to, or greater than its righthand operand, respectively.

First we declare the things that we will need to make the program work:

require 'rubygems'
require 'AWS'

Now we need the access key and secret key which are the
credentials used for any EC2 API call. In a full utility we would look
for them in some sort of configuration file, but for this example we
will just retrieve them out of the environment.

accesskey = ENV['AMAZON_ACCESS_KEY_ID']
secretkey = ENV['AMAZON_SECRET_ACCESS_KEY']

Next we create the EC2 object using the access and secret keys:

ec2 = AWS::EC2::Base.new(:access_key_id => accesskey, :secret_access_key => secretkey)

Then we calculate an expiration time of 30 days. Any snapshot with a timestamp
older than this needs to be removed. Here we invoke the class method
Time.now to get the current time.

older_than = Time.now - 30 * 24*60*60

The rest of the program can be expressed with a single statement.
I’m going to break this down in to several steps to make it easier
to follow the descriptions. But notice the dot at the end of each of the
lines indicating that another method invocation follows.
We call DescribeSnapshots to get a list of the current
snapshots, passing a parameter that indidates we only want our own.
Then we retrieve the array of snapshots from within the snapshotSet.

ec2.describe_snapshots(:owner => 'self').snapshotSet.item.

We use select on the results to grab just the objects in the
array whose start time is old enough. The block of code passed to
select includes an invocation of the class method Time::parse,
which reads a string and converts it in to a time value. This step is
necessary in order to compare the snapshot start time to our previously
calculated cutoff time.

  select{|a| Time::parse(a.startTime) < older_than}.

We sort the resulting array on start time. For the sort we are
relying on the fact that the lexical ordering of the start time’s string
representation will provide the correct result.

  sort{|a,b| a.startTime <=> b.startTime}.

Finally we invoke the iterative method each to step through
the list of snapshots. For each snapshot we print information about it,
then call DeleteSnapshot with its snapshotId. We check
the return status of the call to ensure that it succeeded and print something
appropriate. Note we are passing a block of code to the method
each but in this case it is marked with the words do and
end and it spans multiple lines.

  each do |s|
    printf "%-15s %-15s (%s) ... ", s.snapshotId, s.volumeId, s.startTime
    a = ec2.delete_snapshot(:snapshot_id => s.snapshotId)
    puts a.return ? "deletedn" : "failedn"
  end

The entire final statement in the program looks like this:

ec2.describe_snapshots(:owner => 'self').snapshotSet.item.
  select{|a| Time::parse(a.startTime) < older_than}.
  sort{|a,b| a.startTime <=> b.startTime}.
  each do |s|
    printf "%-15s %-15s (%s) ... ", s.snapshotId, s.volumeId, s.startTime
    a = ec2.delete_snapshot(:snapshot_id => s.snapshotId)
    puts a.return ? "deletedn" : "failedn"
  end

I hope that this gives you a flavor for the Ruby language, and that it
demonstrates the language’s expressive power and flexibility while solving
a real world problem.