Why Ruby on Rails Migrations Don't Work

Mon, Oct 5, 2009

by Szymon Lipiński

tags: database ruby postgresql programming rails

Migrations in Ruby on Rails provide some nice faetures for manipulating database structure. It is used mainly for two reasons:

this is a nice way of keeping database structure changes
you don’t need to know SQL
the generated SQL code can be used for many database engines

The first point is OK (this is a must, you really should keep such a track of database changes). The second would be nice… but that simply doesn’t work. I don’t think that not knowing SQL is such a huge advantage.

Why many programmers think that it’s a virtue that they don’t know something?

Knowing SQL is really a need when you write some database software using relational databases.

A Migration Example

A simple example of a migration. One migration in one file, all placed in the db/migration directory in the Rails project.

class SampleMigration < ActiveRecord::Migration
  def self.up

  end
  def self.down

  end
end

The whole migration is in one class with two functions: up and down. Function up is used for migrating in the up direction, and down in the opposite. In the ideal world the down function should roll back all changes created by the up function.

Unfortunatelly that’s usually impossible (OK, that’s possible but only in really simple migrations and only sometimes).

Here is just a more complex example of a migration:

class SampleMigration < ActiveRecord::Migration
  def self.up
    create_table :people do |t|
      t.column :name, :string
    end
  end
  def self.down
    drop_table :people
  end
end

The up function creates a table named people with one colum name of a type string. Additionally migration adds some more, but nevermind.

The down function just drops that table.

That looks quite nice and simple but of course creating or dropping a table is the simplest task. Normally you’ve got to manage more complicated tasks such as creating and filling a column on a big table or creating a new column from two existing, and dropping those existing columns after that.

Problems with migrations.

No foreign keys

There are no functions for easy creating foreign keys. You can always add a foreign key using for example the below migration, but this is a real pain as you have to write your own SQL together with some Ruby code.

I thought that migrations are for writing no SQL. If I had to write my own SQL for such a simple things then I’d rather do them on my own from the very beginning using just plan SQL.

class ExampleMigration < ActiveRecord::Migration
  def self.up
    create_table :products do |t|
      t.references :category
    end
    #add a foreign key
    execute <<-SQL ALTER TABLE products ADD CONSTRAINT fk_products_categories FOREIGN KEY (category_id) REFERENCES categories(id) SQL
add_column :users, :home_page_url, :string
rename_column :users, :email, :email_address
  end
  def self.down
    rename_column :users, :email_address, :email
    remove_column :users, :home_page_url
    execute "ALTER TABLE products DROP FOREIGN KEY fk_products_categories"
    drop_table :products
  end
end

No checks

The same problem is with checks. It seems like the Rails folks agree that the database checks have to be done in Acive Record models, and not in the database.

Just another myth. There is nothing wrong in creating check in Rails models, that really helps writing information about wrong data to the user. But not checking all data in database is a problem similar to checking form data only using Javascript and omiting the check at the server side.

Some time ago I read a story about such an approach, all data validations were made using the model layer. No checks. No unique indices. No foreign keys in the database.

One day they had to give access to the database to some external company, so they could make their own modules. They of course used just the database with a different programming language. So they simply couldn’t use the same application level checks.

The same time the main company realised that their importer is rather slow, so to speed it up, they just wrote some complicated SQL queries replacing the Active Record classes. All were fine, the external module was working hard, the importer was faster… However after some time they realised that all data in the database are not valid. It was just garbage, not proper data. But unfortunately it was too late.

No stored procedures

There is no api for creating a stored procedure or for using that. You just write a simple SQL and can run it using execute from the migration. Maybe not a big problem.

Funny unique constraints

Unique constraints are usually (as I saw in almost all migration examples that I found) set only in the AR classes, and all of them are checked only in the model layer. No unique indexes in databases because they are not needed.

That can lead to great problems. Let’s see this procedure, we have a table users where the login column is of course unique.

The procedure of adding a new user to database looks like this:

Check if there is such a user
If there is not, add it

Look good, isn’t it? But what if we’ve got two web page users, who just want to have the same login? Active Record objects would do it this way:

User A: Check if there is such a user (found that there is not)
User B: Check if there is such a user (found that there is not)
User A: add user with login ‘simon’
User B: add user with login ‘simon’

And suddenly there are two user records with the same login.

I can see only two ways of improving that without the unique index:

Lock the whole users table before the first step, what causes great efficiency problems.
Make some other locks for inserting a record to this table, what causes some not so nice hacks around inserting a record.

What’s more it’s known behaviour, as we can read in the official Ruby on Rails guide:

„Validations such as validates_uniqueness_of are one way in which models can enforce data integrity. The :dependent option on associations allows models to automatically destroy child objects when the parent is destroyed. Like anything which operates at the application level these cannot guarantee referential integrity and so some people augment them with foreign key constraints.”
– 7. Active Record and Referential Integrity

This is where I fully agree, adding the validate_uniqueness_of in the model can help to write nice error message for the user. However there is no other efficient, and easy to set, way of ensuring that all data in a column is unique other than creating a unique index.

No partial indices

In PostgreSQL there are partial and functional indices. Such nice things as:

CREATE INDEX i_users_active ON users (login) WHERE is_active = true;

CREATE INDEX i_users_active_lower ON users ( lower(login) )
                                             WHERE is_active = true;

Unfortunately without hacks or custom SQL I cannot do that in migrations.

No special database types

Some databases have their special datatypes or you can define your own. In PostgreSQL you’ve got e.g. hstore or arrays. There is no way to use those in the migrations. Of course you can use just all the standard data types, but if you want to use some special, you’ve got a problem.

Migration up/down is atomic

Is atomic and is run in one transaction. The problem appears when you’ve got a huge (huge using this definition) database and want to update too many records. During such an update all records will be locked for writing so it is rather stupid to make that on a huge, very used table.

Example of such a query:

UPDATE adverts
SET valid_till = valid_from + '2 days'::interval
WHERE created_when < '2009-01-01';

So there is a table with adverts, and we set all adverts, added before the year 2009, to be valid for two days. What if there are 2 million of such adverts being changed at the same time, for example by an importer? Such a query can totally lock all other queries.

What’s more, there is a huge chance that during the query execution there will be a deadlock, so the whole operation will be started again.

Another solution is to lock the whole table, there should be no deadlocks then, but again all other queries will wait, and all applications will wait.

And there will be a call from the management soon… and we don’t like it.

Usually such a change can be made in small parts. Let’s say using 100 adverts at a time. The simplest query would look like this:

UPDATE adverts
SET valid_till = valid_from + '2 days'::interval
WHERE created_when < '2009-01-01'
  AND id in
    (  SELECT id
       FROM adverts
       WHERE created_when<'2009-01-01'
       ORDER BY id
       LIMIT 100 OFFSET ?
    );

In the ? parameter should be inserted the number like 0, 100, 200 and so on. Using this approach the whole data change could be made in days or weeks, but that won’t affect all applications and won’t block them during the whole update process.

Unfortunatelly there is no way to do that using migrations.

Engine independency myth

All the Active Record API, and migrations are created with the database independency in mind. Great. But that simply doesn’t work. There are too huge differences between the database engines.

There is usually a moment when you have to write your custom SQL query instead of an ORM generated one. Just for speeding that up. And such a query usually uses some special unique engine features such as for example partial indices or hstore/ltree types in PostgreSQL.

After such a query, the application isn’t engine independent any more, so why do you think that it is important? I haven’t met so far any advanced, and more complicated applications that would be engine independent. And if such an independancy is not really achievable, then maybe just don’t bother, and use your db engine features from the very beginning? This includes creating stored procedures, triggers, checks, foreign keys and so on. So maybe use migrations with custom sql all the time or write your own plugins for generating custom sql?

Some examples are terrible

In some examples of the migration usage there is shown how to use ActiveRecord classes in migrations. But why almost all such migrations look like this:

class ChangeStatus < ActiveRecord::Migration
  def self.up
    Product.find(:all) do |p|
      p.update_attribute(:status, 10)
    end
  end

  def self.down

  end
end

The huge problem is that during the up migration, this function gets all the rows from a table, changes that, and updates in database. All that could be done with a simple sql query such as:

UPDATE products SET status=10;

Usually when a programmer writes such a code, he or she tests that on a small, very small, developer’s database. Problems will appear when such a code is used on a production server where the database can be much bigger.

What then? Use a different set of migrations for production, and different for developers?

Schema dumping

Ruby on Rails migrations can make a schema dump into a file, so later, when you want to create a new database, you don’t have to make all the migrations but you can only load this one file.

That is nice… but there is a small gotcha. Normally the database structure is dumped into a Ruby file. But there are only all those Ruby function calls without any custom SQL. If you want to have dumped all SQL needed for creating such a database, you have to change the dumping method and use the db:structure:dump Rake task. This task will use the special program for the used database engine for creating a simple database dump. In PostgreSQL it would be something like:

pg_dump -s

No Slony Like Solution

Slony is a master-multi slave replication system for PostgreSQL. While using Slony I’ve got to run all DDL queries (DDL – Data Definition Language – so all thoser queries that change data structure) using a special script. That script executes the DDL queries on the master database, and later all those DDLs are replicated on slaves.

When all the database changes are generated by ruby classes from migrations, there is a problem with getting the SQL out of those Ruby files. David Wheeler has a similar point of view.

Final Thoughts

I think of a database as a good data storage where all data is valid. All the time is valid. That’s why I need all those checks, foreign keys, triggers and so on. Ruby on Rails folks just treat the database as a simple bag full of data.

Unfortunately when the application grows much bigger the simple bag becomes rather dirty. And it looks rather like a GIGO = Garbage In, Garbage Out approach. Do whatever you want but you should know that when data in database is not valid, it’s just too late and you can throw such a database away.

Better Migrations?

The whole migration idea is quite nice. Simply it allows for keeping the track of the database structure changes, and allows for keeping that in a simple version control system like git. The rake mechanism is also very nice. What I don’t like about migrations is stated earlier.