In which we discuss our ideas about software development & technology consulting.

Migrating Legacy Data

You've done it, you've found a green field project. For any developer this is an exciting prospect. A new project to make all the right choices where in previous projects you've made the wrong ones. A new project to start clean with... but then it happens, the client says: "of course we'll want all the data from the legacy app pulled over". Your heart sinks, your mind races and jumps directly to the awful legacy data schema your beautiful new app has just been saddled with. If you're dealing with a simple legacy data model you might be alright -- generate some CSV files containing user information, some quick scripting and you're done. But what if the legacy data is big, complex and nasty? In this post we'll describe our favorite way of bringing legacy data gracefully into your shiny new Rails app.

Using the tools we already have

If you're writing a Rails application, you're likely very familiar with our friend ActiveRecord. The gist of our legacy data strategy is: setup an ActiveRecord connection to your legacy database, build models that represent the legacy models and use them to translate to your shiny new models. The following lays out the particulars of how we go about doing that.

Get connected

It's not something that's used all that often (at least we don't) but Rails has absolutely no problem hooking up with multiple databases. First step, add the connection details to config/database.yml:

legacy_development:
  adapter: mysql2
  host: localhost
  database: super_legacy_app_dev

Next we like to define a subclass of ActiveRecord::Base that all of our legacy model objects will inherit from (instead of normal ActiveRecord::Base). Ours looks something like this:

class Legacy::Base < ActiveRecord::Base
  self.abstract_class = true
  establish_connection "legacy_#{Rails.env}"
end

The above code isn't anything fancy, but it gives us a jumping off point for all of our new legacy model classes. One thing to note is that we've namespaced everything under "Legacy". You don't have to do this, but you'll probably wish you did if you don't. It lets us keep all our legacy models segregated in our project file structure and more importantly prevents naming collisions with our real applications models. Considering we're doing this with the express purpose of rebuilding these models in the new application the chance for naming collisions is high.

Stop, model time

Now that the foundation is laid we can move on to getting some work done. The main gotcha when building out models is that if you're working with a schema from a non-Rails application you'll likely have to account for that in your model definition. Most likely the table you're attempting to work with has some wacky name.

"What should we call the table keeping all the account records? How about file_info? Done." - some other developer you now despise

ActiveRecord has you covered on table names though, just do something like the following:

class Legacy::Account < Legacy::Base
  self.table_name = 'file_info'
end

Other than having basic access to the legacy model attributes, it's also helpful to link the legacy models to each other via normal rails associations. The main stumbling block here is that if your legacy model names don't match nicely with the actual table names you'll likely have to specify class names and foreign keys, like so:

class Legacy::Contest < Legacy::Base
  self.table_name = 'application_contest'

  has_and_belongs_to_many :children
    join_table: :application_contest_children,
    class_name: 'Contest'

end

You fancy huh?

One of the main benefits of this approach (for us) is that it legitimizes the data import process. The data import is no longer your crazy CSV parsing ball of code that spews out models into your database. It's a fully functioning part of your app... and easily testable. We tend to write our legacy models with a to_new_model method which handles all the logic of mapping old data to new. Added benefit, it's super easy to unit test.

class Legacy::Account < Legacy::Base
  self.table_name = 'file_info'

  ...

  def to_new_model
    account = Account.new
    account.name = self.some_legacy_column
  end

end

Something else we tend to throw in that's proven particularly handy when debugging issues after import is to add a legacy_id column to the new model. Then when we import we link the new model to the old. After you get the data fully migrated and are in production, just write a migration that removes that column from your models.

Wrapping up

In the end, this isn't for every situation. There will be a lot of times where a simple CSV import script will do the trick. That said, there's a lot of times where you'll be asked to move complex relational data from a legacy application into your awesome new Rails app. In those cases we think this strategy comes out on top for a lot of reasons. It's testable, it's easier to reproduce, and it's easier to work with in general. There are also less tangible results: In order to model your legacy data with ActiveRecord you'll have to really understand how it works, and your implementation will be all the better for it.

... maybe that previous developer that named that table what you thought was a goofy name wasn't so awful after all. Happy coding!

Follow the conversation on Hacker News.

Ed

Ed Schmalzle

Ed is a principal and lead developer at Back Forty.