Skip to content


PostgreSQL Collation – part 2

Due to many questions about the PostgreSQL collation that were asked after my last short article about that, I decided to write some more about the current collation situation in PostgreSQL.

What The Collation Is

The collation setting is responsible for correct working functions such as

  • converting letter to upper
  • converting letter to lower
  • sorting texts
  • ability to use indexes in `like` clauses

If we have text written in chineese or polish and we’ve got the japaneese collation, we can be sure that sorting texts won’t look like we want it.

Types of Language Settings

Generally there are two types of languge settings in PostgreSQL. One is can be set for each connection and the other only during the initdb and can’t be changed .

How to Set Collation For The Cluster

There is only one way for setting the cluster collation – during initdb. Initdb is a program for creating initial files and directory structure for the database cluster. Once the collation is set, you can’t change it. For setting the collation, you can initialize the cluster with the command:

initdb --locale=fr_CA

Of course you can use the initdb without any option, then the language settings will be taken from the environment variables.

How to Set Language Settings for Client

For client connection there can be set language settings such as:

  • LC_MESSAGES – language of messages
  • LC_MONETARY – formatting of currency amounts
  • LC_NUMERIC – formatting of numbers
  • LC_TIME – formatting of dates and times

Setting the above variables won’t change anything for the current database or the whole cluster. You can’t change or improve that way how texts are sorted.

The above options can be changed for the whole database using:

ALTER DATABASE mydb SET LC_NUMERIC TO ‘my value’;

How to Check The Variables

For checking all the above variables you can connect to you database and write a query such as:

show LC_COLLATE;

Possible output is:

# show LC_COLLATE;
lc_collate
————-
en_US.UTF-8
(1 row)

Other options can be checked the same way.

How To Reinitdb

That’s not so easy. There is no way to, so called, reinitdb. You can initdb and nothing more. To change those two language settings that can’t be changed normally you can do:

  1. Just make dump of all databases
  2. pg_dumpall > file.sql

  3. Stop the server.
  4. Delete the whole PostgreSQL data directory.
  5. Run initdb once again but with other language settings.
  6. Load the dump that has been made in the first point.

What next?

Fortunately in the next version 8.4 there will be some progress in playing with the languge settings. I wrote that before in PostgreSQL Collation.

Related posts:

  1. PostgreSQL Collation – part 1 PostgreSQL is (IMHO) much better than MySQL but unfortunately it ...
  2. PostgreSQL Collation – 8.4 In the last PostgreSQL version (8.4) there is a small...
  3. PostgreSQL 9.1 Best Features PostgreSQL 9.1 is almost out. That’s great. There are many...
  4. How to Store Application Settings Some ideas on how to store application settings in a...
  5. PostgreSQL FOUND Problem FOUND is a global variable that exists in the plpgsql...
  6. PostgreSQL 9.0 High Performance Review There is another great book on PostgreSQL: PostgreSQL 9.0 High...

Posted in database, software.

Tagged with , .


One Response

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

  1. Jonnathan Guambe says

    Thanks man, i was having problem changing to pt_PT colattion, this post helped me to Reinitdb.
    Cheers



Some HTML is OK

or, reply to this post via trackback.



Better Tag Cloud