Ruby on Rails Migrations
Migrations in Ruby on Rails provide some nice faetures for manipulating database structure. It is used mainly for two reasons:
- this is a nice way of keeping database structure changes
- you don’t need to know SQL
- the generated SQL code can be used for many database engines
The first point is OK (this is a must, you really should keep such a track of database changes). The second would be nice…. but that simply doesn’t work. I don’t think that not knowing SQL is such a huge advantage. Knowing SQL is really a need when you write some database software using relational databases.
Migration Example
A simple example of a migration. One migration in one file, all placed in the db/migration directory in the Rails project.
class SampleMigration < ActiveRecord::Migration def self.up end def self.down end end
The whole migration is in one class with two functions: up and down. Function up is used for migrating in the up direction, and down in the opposite. In the ideal world the down function should roll back all changes created by the up function.
Unfortunatelly that’s usually impossible (OK, that’s possible but only in really simple migrations and only sometimes), but about that later on.
Here is just a more complex example of a migration:
class SampleMigration < ActiveRecord::Migration
def self.up
create_table :people do |t|
t.column :name, :string
end
end
def self.down
drop_table :people
end
end
The up function creates a table named ‘people’ with one colum ‘name’ of a type ’string’. Additionally migration adds some more, but nevermind.
The down function just drops that table.
That looks quite nice and simple but of course creating or dropping a table is the simplest task. Normally you’ve got to manage more complicated tasks such as creating and fillind a column on a big table or create a new column from two existing and dropping those existing columns after that.
Problems with migrations.
No foreign keys
There are no functions for easy creating foreign keys. You can always add a foreign key using below migration but this is a real pain as you have to write your own SQL together with some Ruby code. I thought that migrations are for writing no SQL. If I have to write my own SQL for such a simple things then I’d rather do them on my own from the very beginning using just plan SQL.
class ExampleMigration < ActiveRecord::Migration
def self.up
create_table :products do |t|
t.references :category
end
#add a foreign key
execute <<-SQL ALTER TABLE products ADD CONSTRAINT fk_products_categories FOREIGN KEY (category_id) REFERENCES categories(id) SQL
add_column :users, :home_page_url, :string
rename_column :users, :email, :email_address
end
def self.down
rename_column :users, :email_address, :email
remove_column :users, :home_page_url
execute "ALTER TABLE products DROP FOREIGN KEY fk_products_categories"
drop_table :products
end
end
No checks
The same problem is with checks. All the Rails folks agree that all database checks have to be done in AR models and not in the database. Another myth. There is nothing wrong in creating check in Rails models, that really helps writing information about wrong data to the user. But not checking all data in database is a problem similar to checking form data only using Javascript and omiting the check at the server side.
Some time ago I read a story about such an approach, all data validations were made using the model layer, no checks, unique indices or foreign keys in database. One day they had to give access to the database to some external company so they could make their own modules. They of course used just database with some different programming language so they had a problem with using the application model layer and all the checks. The same time the main company realised that their importer is rather slow so to speed it up, they just wrote some complicated SQL queries instead using AR classes. All were fine, the external module was working hard, the importer was faster… and after some time they realised that all data in the database are not valid. It was just garbage, not proper data. But unfortunately it was too late
No stored procedures
There is no api for creating a stored procedure or for using that. You just write a simple SQL and can run it using execute from the migration. Maybe not a big problem.
Funny unique constraints
Unique constraints are usually (as I saw in almost all migration examples that I found) set only in the AR classes and all of them are checked only in the model layer. No unique indexes in databases because they are not needed.
That can lead to some great problems. Let’s see this procedure, we have a table users where the login column is of course unique.
The procedure of adding a new user to database looks like this:
- Check if there is such a user
- If there is not, add it
Look good, isn’t it? But what if we’ve got two web page users who just want to have the same login? AR objects would do it this way:
- User A: Check if there is such a user (found that there is not)
- User B: Check if there is such a user (found that there is not)
- User A: add user with login ’simon’
- User B: add user with login ’simon’
And suddenly there are two user records with the same login (from the moment one can say that the database sucks… but more professional would be: sucks a lot… data is garbage so throw it away).
I can see only two ways of improving that without the unique index:
- Lock the whole users table before the step 1… (what causes great efficiensy problems).
- Make some other locks for inserting a record to this table (what causes some not so nice hacks around inserting a record).
What’s more it’s known behaviour, as we can read in the official Ruby on Rails guide:
„Validations such as validates_uniqueness_of are one way in which models can enforce data integrity. The :dependent option on associations allows models to automatically destroy child objects when the parent is destroyed. Like anything which operates at the application level these cannot guarantee referential integrity and so some people augment them with foreign key constraints.”
7. Active Record and Referential Integrity
This is where I fully agree, adding validate_uniqueness_of in the model can help to write nice error message for the user, but there is not other efficient and easy to set way of ensuring that all data in a column is unique other than creating a unique index.
No partial indices
In PostgreSQL where are partial and functional indices. Such nice things as:
CREATE INDEX i_users_active ON users (login) WHERE is_active = true; CREATE INDEX i_users_active_lower ON users ( lower(login) ) WHERE is_active = true;
Unfortunately without hacks or custom SQL I cannot do that in migrations.
No special database types
Some databases can have their special datatypes or you can define your own. In PostgreSQL you’ve got e.g. hstore or arrays. There is no way to use those in migrations. Of course you can use just all the standard data types, but if you want to use some special, you’ve got a problem.
Migration up/down is atomic
Is atomic and is run in one transaction. The problem appears when you’ve got a huge (huge using this definition) database and want to update too many records. During such an update all records would be blocked for writing so it is rather stupid to make that on a huge, very used table. Example of such a query:
UPDATE adverts SET valid_till = valid_from + '2 days'::interval WHERE created_when < '2009-01-01';
So there is a table with adverts and we set all adverts added before the year 2009 to be valid for two days. What if there are 2 million of such adverts being changed all the time (for example by the importer)? Such a query can totally block all other queries waiting for releasing this query locks. What’s more, there is a huge chance that during this query execution there will be a deadlock so the whole operation could be started again.
Another solution is to lock the whole table, there should be no deadlocks then, but again all other queries would wait and all applications would wait and there would be a phone from the management soon… and we don’t like it.
Usually such a change can be made in small parts. Let’s say using 100 adverts at a time. The simplest query would look like this:
UPDATE adverts
SET valid_till = valid_from + '2 days'::interval
WHERE created_when < '2009-01-01'
AND id in
( SELECT id
FROM adverts
WHERE created_when<'2009-01-01'
ORDER BY id
LIMIT 100 OFFSET ?
);
In the ? parameter there should be inserted numbers 0, 100, 200 and so on. Using this approach the whole data change could be made in days or weeks, but that won’t affect all applications and won’t block them during the whole update process.
Unfortunatelly there is no way to do that using migrations.
Engine independency myth
All the Active Record api and migrations are created with the database independency in mind. Great. But that simply doesn’t work. There are to huge differences between database engines. There usually is a moment when you have to write your custom SQL query instead of an ORM generated one just for speeding that up. And such a query usually uses some special unique engine features such as for example partial indices or hstore/ltree types in PostgreSQL. After such a query, the application isn’t engine independent any more, so why do you think that it is important? I haven’t met so far any advanced and more complicated applications that would be engine independent. And if such an independancy is not really achievable, then maybe just don’t bother and use your db engine features from the very beginning. That includes creating stored procedures, triggers, checks, foreign keys and so on. So use migrations with custom sql all the time or write your own plugins for generating custom sql?
Some examples are terrible
In some examples of the migration usage there is shown how to use AR classes in migrations. But why almost all such migrations look like this:
class ChangeStatus < ActiveRecord::Migration
def self.up
Product.find(:all) do |p|
p.update_attribute(:status, 10)
end
end
def self.down
end
end
The problem here is that during the UP migration, this function gets all records from a table, changes that and updates in database. All that could be done with a simple sql query such as:
UPDATE products SET status=10;
Usually when a programmer writes such a code, he or she tests that on a small, very small, developer’s database. Problems appear when such a code is used on a production server.
Schema dumping
Ruby on Rails migrations can make a schema dump into a file so that later when you want to create a new database, you don’t have to make all the migrations but only load this one file. That is nice… but a small gotcha here. Normally the database structure is dumped into a Ruby file. But there are only all those Ruby function calls without any custom SQL. If you want to have dumped all SQL needed for creating such a database, you have to change the dumping method and use db:structure:dump Rake task. This task will use the special program for the used database engine for creating a simple database dump. In PostgreSQL it would be something like:
pg_dump -s
No Slony Like Solution
Slony is a master-multi slave replication system for PostgreSQL. While using Slony I’ve got to run all DDL queries (DDL – Data Definition Language – so all thoser queries that change data structure) using a special script. That script executes that DDL queries on master database and later all those DDLs are replicated on slaves.
When all the database changes are generated by ruby classes from migrations, there is a problem with getting the SQL out of those Ruby files. David Wheeler has similar point of view.
Final Thoughts
I think of a database as a good data storage where all data is valid. All the time is valid. That’s why I need all those checks, foreign keys, triggers and so on. Ruby on Rails folks just treat database as a simple bag full of data, but when the application grows much bigger it looks like a GIGO = Garbage In, Garbage Out approach. Do whatever you want but you should know that when data in database is not valid, it’s just too late and you can throw such a database away.
Better Migrations?
The whole migration idea is quite nice, simply it allows keeping track of the database structure changes and allows keeping that in a simple version control system like subversion or git. The rake mechanism also is very nice. What I don’t like about migrations is stated earlier.
As I use mainly PostgreSQL, some time ago I started writing my own migrations for PostgreSQL. Some Rake tasks to load up and down migrations on a database, but all changes are stored in simple SQL files, no Ruby code inside. Of course that won’t be database independent, but who cares. I’ll publish that when will be finished.
Related posts:
- Playing Rails Migrations Last time I keep on playing Ruby on Rails, quite...
- Ruby, Rails… AWESOME I really don’t get it, why almost each post or...
- Datamapper and Ruby On Rails Aaaaaargh. Some time ago I wrote about problems with merb....
- Indentation Checking in Ruby In Python blocks have to be indented just because the...
- How to check PostgreSQL privileges. Someone asked me how to list all tables that a...
- Why Use ORMs? Why Use ORMs? ORM aka Object Relational Mapping is a...













Plain and simple, most rails people are completely clueless when it comes to the database. Script kiddies. Most of them shouldn’t be allowed within a mile data worth more than a nickel.
great post as usual!
Yea, leave real database work to the professionals who will take the responsibility and integrity and leave the rails script kiddies out of the picture.
“you don’t need to know SQL”
Nice one.
i disagree with this post..
i work with a team of developers on a RoR project.. we have over 350 migrations and it has saved us so much time it is crazy..
i agree it is not perfect but for the (rare-ish) times that they don’t cover what you need, you can do things manually…
I recently started a Catalyst project, and migrations is one of the main things i miss about RoR..
sure you can write your own thats gonna be just perfect for your needs, but why bother when this does 95% of stuff and it is already there and works fine…
if you got spare time, you could maybe mod the rails migrations to accept plain SQL then you’ve built what you suggest is a better solution and already have all the other bits in place..
i note your other comments are from what sound like dba’s… i lol at that.. the point is migrations are *really* good, the people commenting sound like they haven’t tried doing dev with and without them…
(sure if you are a CMMI 5 shop and you have a dba team/test team/UAT team/dev team and architects, then you probably don’t need migrations but if you’re trying to get things done fast on the web with a small team, they’re perfect)
@jon
.
Sorry for that long time when your comment was waiting to be published, but I was on holidays, just without any network connection
I agree that migrations are helpful, especially in the beginning of a project. When you have so less data that no one notices dropping a database and recreating the database from a dump.
Doing things manually means that normal rake db:something doesn’t work. Last time I made my own sql in a rails migration, I noticed that later rake db:schema:dump doesn’t load that changes to a file. So once using your own sql, you have to remember that and later make dumps without rake db:schema:dump && load. Maybe you know some better way to accomplish that?
Oh yea, if I would have spare time
Yes, my point of view is partly changed by my partly DBA background. This is because the most important thing in every application is the database, of course if the application uses the database. And this is the fact, not the DBA point of view. Data is usually carefully maintained, while it is used by many applications in many different ways. Data is migrated from one database to another, while applications are sometimes thrown away.
Migrations are really nice and can help programmer to write some code changing databases without knowing any sql. For me this is not an advantage – this way you just don’t use your database specific features (e.g. you buy an expensive oracle licence just for using the database like sqlite).
And I fully agree with the last paragraph: you can use databases and write programms fast, cheap or good – choose two of those. The rails migrations like look now allow for fast and cheap – that doesn’t mean good programming. Migrations are becoming better and better, some time ago one migration wasn’t run in a transaction – now it uses transactions. Maybe one day rails will be the best framework with the best migrations ever created. In my opinion now it is not, althought it is quite useful tool.
Simon, although I agree with you that the missing features of migrations are a big issue, Jon nailed it when he said that migrations do ‘95% of the stuff.’ Migrations are a great starting point and tracking mechanism for your database changes – but it’s not the end all be all.
A lot of databases have many simple tables that migrations work fine for. Even though you will have to go back and write some SQL code to insert Foreign Keys and such, it’s code you would have had to write regardless – except now you’ve just cut out a lot of the tedious parts. The issue comes in with how do we make it easier to implement simple yet necessary things such AS foreign keys.
For the developers who neglect the database/datastore – then those are very, VERY uninformed developers. You can argue that the database IS the application. Without data, you’ve got nothing. If a developer doesn’t know SQL (at least the very basic), then he’s probably a drag and drop designer, not a developer.
Although the criticisms are well deserved of migrations, dismissing them is beyond stupid. Instead of the pointless “script kiddies” banter, people should look at migrations and say “hmm, this isn’t a bad idea. How can we make this better?” I’ve seen my share of “professionals” who don’t know anything (e.g. attaching a database). I believe migrations are a step in the right direction for making our work easier and cutting out a lot of the repetitive tasks. Is it perfect? No. When someone out there decides to polish it up, and call it something like dbBuilder and it becomes a widely used pattern/protocol/software/etc, I wouldn’t be the least bit surprised.
And no, outside of small hobby projects, I’m new to Rails. I’m on the Microsoft enterprise side of things for my day job.