PostgreSQL Collation - part 1

by Szymon Lipiński

PostgreSQL is (IMHO) much better than MySQL but unfortunately it still lacks one of the greatest MySQL features: encoding.

MySQL Collation

Locale settings in MySQL are great. You can have different collation set for each database, table and even column (the same as it is in Interbase/Firebird). This gives the opportunity to simplify the database structure.

For example (I know, a very simplified example) when you have even a simple blog database and want to create a multilingual blog. Then it is enough to create one table for posts with two columns for body, each with different collation.

The PostgreSQL Collation

In PostgreSQL situation is much different. So far you define collation only during running the initdb. That’s during the cluster initialization. The cluster is the whole PostgreSQL where you keep all your databases. Because of that you can’t have two columns with different collation. You even can’t have two databases with different language settings. For having such databases you need to have two different PostgreSQL installations: different installations or two processes run with pg_ctl -D <datadir>.

This is not a problem to have two exactly different databases but this a real pain when you need (and I needed that too) two columns with quite different collation.

PostgreSQL Changes

In the future version of PostgreSQL the collation problem looks like will be fixed. From the version 8.4 there will be the possibility of creating two databases in one cluster, each with a different collation. Unfortunately that won’t fix the problem with setting collation per column.

What the Collation is

The collation is responsible for the alphabetical order, and for working the functions like upper and lower. If there is wrong collation then you will have problems with correct sorting of the texts in the database.