Sometimes you have to write a new Rails app for old data. One approach is to keep the old format and carefully craft your ActiveRecord models to work with the existing schema. But, odds are the old schema has become a mess overtime and wasn’t particularly Rails friendly to start with. When time and resources allow, I prefer a do-over.
That means Extracting, Transforming, and Loading (ETL) the old data. And for that, I like a Rake task.
First you need to choose the correct gem to connect to your database, hopefully MySQL or PostgreSQL, and, if you aren’t already using it, add it to your Gemfile:
1 2 3 4 5
Then you need to define the database connection. Since I already have
a process for managing the
database.yml file, I like to put it there.
1 2 3 4 5 6
(The connection name isn’t important.) You can then pull it back with:
This works well if you are using the mysql2 gem as it can directly pass the hash to it. The pg gem use a slightly different format, so you might need to tweak the config from the standard database.yml format.
For the sake of brevity, I’m going to use MySQL examples, PostgreSQL has a different syntax, but all the same functionality, see the docs.
Create a new file
lib/tasks/import_legacy.rb (or get all technical
and call it
Require your gem at the top :
and create a helper function to connect to the legacy production database at the bottom:
1 2 3
Now we need to name space our tasks:
1 2 3 4 5
This names our tasks like
db:import_legacy:whatever. I like to
put them under db so it’s clear they mess with data, but call them
what makes sense to you and yours.
Now we need an import task:
1 2 3 4 5 6 7 8 9 10
Let’s break this down.
desc "import legacy customers" labels the task. It’s optional, but
without it, the task will not be listed when you run
task :customers => :environment do defines a task
=> :environment syntax says “call the
‘environment’ task before running. ‘environment’ is a build in task
that loads Rails. Omitting it makes tasks that don’t need Rails, say
clearing out temp files, run much faster, however we need it here to
access our new models.
customers = legacy_database.query("SELECT * FROM customers") selects the
customer data from the legacy database. For PostgreSQL you would use
exec instead of
name = Namae.parse(customer['name']) is a sample transformation,
handling a common case were the legacy schema stored names in a single
column and we now want first and last
names. Namae is a gem that
does very smart parsing of names into the component parts, handling
things like titles, appellation, and suffixes.
Finally, I create the new customer. This is normal ActiveRecord stuff except that I’m passing in the id. Typically, you want to preserve the original id so that you don’t have to rebuild relationships. For example, our next task might import orders:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
This presumes that both the legacy and new schemas have the same sort of has_many/belongs_to relationships. In real life, you won’t be so lucky and the legacy schema will be much hairier. One to one matches of legacy tables to models are great when you have them, but more often than not you won’t.
Legacy data often has errors, which is why this code doesn’t assume it will always find a matching user. Anywhere you have a relationship, you’ll want to make sure it’s valid.
The above gives you a framework for “E” and “L” however, at the end of the day, the hard part is “T - transformation”. You’ll need to work the details out yourself, but hey, that’s why you get the big bucks.