OK, so you know how to get data into AWS S3, what about getting it out? Previously, we uploaded entries from an imagined photo contest into a bucket. We sent a pair of files, a JSON file with the form data and the image. Let’s presume there’s a Rails app, it’s details don’t matter, but it has a model ContestEntry and we want to populate it from the S3 data. We’re going to write a script to do the import.
When a script needs to load Rails, you do something like:
The exact path for config/environment will depend on where the
script is, in this case I’m presuming a subdirectory under
Loading Rails gives us the model. Now we need the S3 files. As before,
we use the
aws-sdk gem, which should be in your Gemfile.
I cover the basics of authenticating to S3 here. The code below assumes credentials are coming from the environment (or an AWS credential file in dev).
Getting our bucket is easy:
As is getting the files (objects) in the bucket.
1 2 3
But from there it gets a little convoluted. The object is actually a
Aws::S3::ObjectSummary which has meta data about the object and can
preform operations like moving, coping, or deleting the object, but
isn’t the S3 object itself. To fetch the actual object, you have to
#get on the
Once you have the actual object (really the Ruby object that wraps
HTTP calls that can access the S3 object), you can get it’s
#body which is actually a
StringIO object. Confused?
Code brings clarity.
We’ll find all of the json object in the bucket:
Grab the first one, using
get to fetch the actual object:
Then get it’s contents from
read (since it’s an IO class
Finally, we parse that JSON and get a hash:
I find that interface a little funky, but Bam! now we have our form data which we can save in our model:
(You’re going to validate that data and not accept it blindly, right?)
We saved the
ObjectSummary object so we could get the JSON file’s
name which is our original UUID:
And with that we can find the photo we uploaded:
Note the switch to
detect (There can be only one.) and the lovely
negated lookbehind regexp! Again when need to get the actual S3 object:
Which we could save local with something like:
Or process it with CarrierWave or Paperclip or even leave it in S3 and serve it directly from there.
We have the data in our app and can do whatever it was we wanted with
it. All that remains is to somehow mark it the entries having been
processed, we don’t want to import it multiple times. The simplest way
to do that is to delete it, which can be done by calling
If you’d rather backup the data instead of throwing it away, you can
rename the files with
#move_to. The simplest way to us
is to pass in a string in the form of “target-bucket-name/target-key”:
If you don’t want to use a separate bucket, you could put the files in a “subfolder” instead:
But, keep in mind that “folders” in AWS are an illusion. They are
really just part of the file’s name. As a result,
will return all the files no matter how deeply nested. You can filter
by using the “prefix” option:
and then filter the initial
And with that, I end the S3 upload series. You now have the tools to use S3 as a job queue. Use them wisely.