Why Ruby on Rails Migrations Don't Work
Migrations in Ruby on Rails provide some nice faetures for manipulating database structure. It is used mainly for two reasons:
- this is a nice way of keeping database structure changes
- you don’t need to know SQL
- the generated SQL code can be used for many database engines
The first point is OK (this is a must, you really should keep such a track of database changes). The second would be nice… but that simply doesn’t work. I don’t think that not knowing SQL is such a huge advantage.
Why many programmers think that it’s a virtue that they don’t know something?
Knowing SQL is really a need when you write some database software using relational databases.
A Migration Example
A simple example of a migration. One migration in one file, all placed in the db/migration
directory in the Rails project.
class SampleMigration < ActiveRecord::Migration
def self.up
end
def self.down
end
end
The whole migration is in one class with two functions: up
and down
.
Function up
is used for migrating in the up direction, and down
in the
opposite. In the ideal world the down
function should roll back all changes
created by the up function.
Unfortunatelly that’s usually impossible (OK, that’s possible but only in really simple migrations and only sometimes).
Here is just a more complex example of a migration:
class SampleMigration < ActiveRecord::Migration
def self.up
create_table :people do |t|
t.column :name, :string
end
end
def self.down
drop_table :people
end
end
The up
function creates a table named people
with one colum name
of
a type string
. Additionally migration adds some more, but nevermind.
The down
function just drops that table.
That looks quite nice and simple but of course creating or dropping a table is the simplest task. Normally you’ve got to manage more complicated tasks such as creating and filling a column on a big table or creating a new column from two existing, and dropping those existing columns after that.
Problems with migrations.
No foreign keys
There are no functions for easy creating foreign keys. You can always add a foreign key using for example the below migration, but this is a real pain as you have to write your own SQL together with some Ruby code.
I thought that migrations are for writing no SQL. If I had to write my own SQL for such a simple things then I’d rather do them on my own from the very beginning using just plan SQL.
class ExampleMigration < ActiveRecord::Migration
def self.up
create_table :products do |t|
t.references :category
end
#add a foreign key
execute <<-SQL ALTER TABLE products ADD CONSTRAINT fk_products_categories FOREIGN KEY (category_id) REFERENCES categories(id) SQL
add_column :users, :home_page_url, :string
rename_column :users, :email, :email_address
end
def self.down
rename_column :users, :email_address, :email
remove_column :users, :home_page_url
execute "ALTER TABLE products DROP FOREIGN KEY fk_products_categories"
drop_table :products
end
end
No checks
The same problem is with checks. It seems like the Rails folks agree that the database checks have to be done in Acive Record models, and not in the database.
Just another myth. There is nothing wrong in creating check in Rails models, that really helps writing information about wrong data to the user. But not checking all data in database is a problem similar to checking form data only using Javascript and omiting the check at the server side.
Some time ago I read a story about such an approach, all data validations were made using the model layer. No checks. No unique indices. No foreign keys in the database.
One day they had to give access to the database to some external company, so they could make their own modules. They of course used just the database with a different programming language. So they simply couldn’t use the same application level checks.
The same time the main company realised that their importer is rather slow, so to speed it up, they just wrote some complicated SQL queries replacing the Active Record classes. All were fine, the external module was working hard, the importer was faster… However after some time they realised that all data in the database are not valid. It was just garbage, not proper data. But unfortunately it was too late.
No stored procedures
There is no api for creating a stored procedure or for using that. You just write a simple SQL and can run it using execute from the migration. Maybe not a big problem.
Funny unique constraints
Unique constraints are usually (as I saw in almost all migration examples that I found) set only in the AR classes, and all of them are checked only in the model layer. No unique indexes in databases because they are not needed.
That can lead to great problems. Let’s see this procedure, we have a table users where the login column is of course unique.
The procedure of adding a new user to database looks like this:
- Check if there is such a user
- If there is not, add it
Look good, isn’t it? But what if we’ve got two web page users, who just want to have the same login? Active Record objects would do it this way:
- User A: Check if there is such a user (found that there is not)
- User B: Check if there is such a user (found that there is not)
- User A: add user with login ‘simon’
- User B: add user with login ‘simon’
And suddenly there are two user records with the same login.
I can see only two ways of improving that without the unique index:
- Lock the whole users table before the first step, what causes great efficiency problems.
- Make some other locks for inserting a record to this table, what causes some not so nice hacks around inserting a record.
What’s more it’s known behaviour, as we can read in the official Ruby on Rails guide:
„Validations such as validates_uniqueness_of are one way in which models can enforce data integrity. The :dependent option on associations allows models to automatically destroy child objects when the parent is destroyed. Like anything which operates at the application level these cannot guarantee referential integrity and so some people augment them with foreign key constraints.”
– 7. Active Record and Referential Integrity
This is where I fully agree, adding the validate_uniqueness_of
in the model can help to write nice error message for the user. However there is no other efficient, and easy to set, way of ensuring that all data in a column is unique other than creating a unique index.
No partial indices
In PostgreSQL there are partial and functional indices. Such nice things as:
CREATE INDEX i_users_active ON users (login) WHERE is_active = true;
CREATE INDEX i_users_active_lower ON users ( lower(login) )
WHERE is_active = true;
Unfortunately without hacks or custom SQL I cannot do that in migrations.
No special database types
Some databases have their special datatypes or you can define your own. In PostgreSQL you’ve got e.g. hstore or arrays. There is no way to use those in the migrations. Of course you can use just all the standard data types, but if you want to use some special, you’ve got a problem.
Migration up/down is atomic
Is atomic and is run in one transaction. The problem appears when you’ve got a huge (huge using this definition) database and want to update too many records. During such an update all records will be locked for writing so it is rather stupid to make that on a huge, very used table.
Example of such a query:
UPDATE adverts
SET valid_till = valid_from + '2 days'::interval
WHERE created_when < '2009-01-01';
So there is a table with adverts, and we set all adverts, added before the year 2009, to be valid for two days. What if there are 2 million of such adverts being changed at the same time, for example by an importer? Such a query can totally lock all other queries.
What’s more, there is a huge chance that during the query execution there will be a deadlock, so the whole operation will be started again.
Another solution is to lock the whole table, there should be no deadlocks then, but again all other queries will wait, and all applications will wait.
And there will be a call from the management soon… and we don’t like it.
Usually such a change can be made in small parts. Let’s say using 100 adverts at a time. The simplest query would look like this:
UPDATE adverts
SET valid_till = valid_from + '2 days'::interval
WHERE created_when < '2009-01-01'
AND id in
( SELECT id
FROM adverts
WHERE created_when<'2009-01-01'
ORDER BY id
LIMIT 100 OFFSET ?
);
In the ?
parameter should be inserted the number like 0
, 100
, 200
and so on. Using this approach the whole data change could be made in days or weeks, but that won’t affect all applications and won’t block them during the whole update process.
Unfortunatelly there is no way to do that using migrations.
Engine independency myth
All the Active Record API, and migrations are created with the database independency in mind. Great. But that simply doesn’t work. There are too huge differences between the database engines.
There is usually a moment when you have to write your custom SQL query instead of an ORM generated one. Just for speeding that up. And such a query usually uses some special unique engine features such as for example partial indices or hstore/ltree types in PostgreSQL.
After such a query, the application isn’t engine independent any more, so why do you think that it is important? I haven’t met so far any advanced, and more complicated applications that would be engine independent. And if such an independancy is not really achievable, then maybe just don’t bother, and use your db engine features from the very beginning? This includes creating stored procedures, triggers, checks, foreign keys and so on. So maybe use migrations with custom sql all the time or write your own plugins for generating custom sql?
Some examples are terrible
In some examples of the migration usage there is shown how to use ActiveRecord classes in migrations. But why almost all such migrations look like this:
class ChangeStatus < ActiveRecord::Migration
def self.up
Product.find(:all) do |p|
p.update_attribute(:status, 10)
end
end
def self.down
end
end
The huge problem is that during the up
migration, this function gets all the rows from a table, changes that, and updates in database. All that could be done with a simple sql query such as:
UPDATE products SET status=10;
Usually when a programmer writes such a code, he or she tests that on a small, very small, developer’s database. Problems will appear when such a code is used on a production server where the database can be much bigger.
What then? Use a different set of migrations for production, and different for developers?
Schema dumping
Ruby on Rails migrations can make a schema dump into a file, so later, when you want to create a new database, you don’t have to make all the migrations but you can only load this one file.
That is nice… but there is a small gotcha. Normally the database structure is dumped into a Ruby file. But there are only all those Ruby function calls without any custom SQL. If you want to have dumped all SQL needed for creating such a database, you have to change the dumping method and use the db:structure:dump
Rake task. This task will use the special program for the used database engine for creating a simple database dump. In PostgreSQL it would be something like:
pg_dump -s
No Slony Like Solution
Slony is a master-multi slave replication system for PostgreSQL. While using Slony I’ve got to run all DDL queries (DDL – Data Definition Language – so all thoser queries that change data structure) using a special script. That script executes the DDL queries on the master database, and later all those DDLs are replicated on slaves.
When all the database changes are generated by ruby classes from migrations, there is a problem with getting the SQL out of those Ruby files. David Wheeler has a similar point of view.
Final Thoughts
I think of a database as a good data storage where all data is valid. All the time is valid. That’s why I need all those checks, foreign keys, triggers and so on. Ruby on Rails folks just treat the database as a simple bag full of data.
Unfortunately when the application grows much bigger the simple bag becomes rather dirty. And it looks rather like a GIGO = Garbage In, Garbage Out approach. Do whatever you want but you should know that when data in database is not valid, it’s just too late and you can throw such a database away.
Better Migrations?
The whole migration idea is quite nice. Simply it allows for keeping the track of the database structure changes, and allows for keeping that in a simple version control system like git. The rake mechanism is also very nice. What I don’t like about migrations is stated earlier.