There is a huge number of books about TDD. Those books usually concentrate on using test for units of work. Units of work are understood in many different ways, usually it means a class. So the books say: write a lot of tests, make code in such a way the the tests pass. All external resources should be mocked so you can test only this unit.
That’s cool, but unfortunately in many cases all of the testing stops at this moment. There are some queries (written by hand or generated by some ORMs) but they are not tested, usually. Some programmers test those using integration tests – tests which connect to a real database and perform real queries. This usually means testing the happy path – I’ve got my ORM so it takes care of everything, I don’t need to think.
Database is usually the most valuable asset a company owns. Applications are rewritten from time to time. Old applications are thrown away, new applications are installed. Nobody throws away the database full of data when changing the applications. The database is carefully migrated. There are systems where many different applications use the same database at the same time. That’s why it is important to have good database model full of constraints and why database should be treated with care. You really don’t want to destroy the data consistency as this could cost your company too much.
This article is about often forgotten database testing. Integration tests using real databases. It really doesn’t matter what kind of database engine you are using. You can use PostgreSQL, MySQL, Oracle or even those funny noSQL databases like MongoDB. Below rules can be applied to all kinds of databases and all kinds of applications. Maybe not all of them, as noSQL databases cannot enforce data integrity.
Your application usually consists of many different parts. There is some <put here any language you like> code, some configuration files, some SQL queries, some external systems. Testing an application means testing each part separately (only because that makes finding bugs easier) and testing how all the parts cooperate. Database is one of suchs parts, and you should test it thoroughly. Continue reading →
I was explaining the PostgreSQL version naming convention to my colleague a couple of days ago. The differences between e.g. 8.3 and 8.4 and what is different between 8.4 and 9.0 and what should be used during database migration to another version. I knew all that just because somebody told me that many years ago (I think it was depesz – thanks).
However I was unaware that there is documentation of the whole versioning issue. All you can find here.
Another reason why love this project. And yes, you should have that versioning policy in you project too (it really doesn’t matter if it looks different – better to have a convention than not).
I observe the endless war about TDD. On one side there are those who claim that TDD is the best thing since sliced bread. On the other side there are those who claim that TDD is just a waste of time and is too slow.
But what about databases? Nobody writes about them too seriously.
In some books there are some wise words like:
we should use TDD for DAO
we should use TDD for ORM objects
we should use TDD for queries
This is very good for testing unique, foreign and primary keys. Very good for all checks. What is missing here? Continue reading →
During the last database training that I led, I was talking about selecting rows from database and sorting them (including some problems with collations). Later I was talking about triggers. For this entry assume that there are only row level triggers.
When you perform e.g. INSERT in PostgreSQL, there are fired triggers (if they exists, of course). At the very beginning there are fired all BEFORE INSERT triggers. Then INSERT is made, and later all triggers AFTER INSERT are fired.
If more than one trigger is defined for the same event on the same relation, the triggers will be fired in alphabetical order by trigger name. trigger-definition
And then suddenly I was asked:
- OK, that’s clear, but which collation is used for ordering the triggers?
This story is a just another example of what I call
Using varchar(255) for everything won’t make you a sorcerer.
Every database has many different types, they are there on purpose, not by accident. I’ve seen that too many times: a programmer (most companies don’t have DBAs for that) creates a database structure using ONE type. Sometimes they use more… but hardly ever they use all those nice features they database engine provides. This is a common pattern:
Pay thousands of dollars for the database – use that like CSV file.
A couple of weeks ago I had a quite strange situation: a PostgreSQL table with small number of huge geometries. A program had to query the table really often, so the speed was a crucial thing. Continue reading →
A couple of days ago a colleague of mine had a quite normal problem. The query was slow. The whole application was written in Java, using Hibernate, tables were small. There were about 30 rows in one table, and 10 in another. One small join, one call of PostGIS function. The query took several seconds to return just about 20 rows. Quite bad. Continue reading →
In the simple project I work sometimes on, I need to generate a lot of random data in PostgreSQL database. I’ve got a couple of functions for that, but all of them are based on the simple random() function.
The random() returns value from the range [0, 1):
# select random();