Last time on many forums there were endless discussions about database joins that they are very bad, very unefficient and terrible and the best is to avoid them. Generally that’s not true and I’ll try to show this here (taking some small 10 million records tables as on one forum was suggested).
First let’s create a simple table:
create table t1 (
id serial primary key,
i integer not null,
t text not null
);
now let’s fill this having 10 million records:
10 times run this:
insert into t1 (i,t) select generate_series(1,1000000), '';
And now (just to have this table bigger as some state that joins on small tables are fast, on big tables are bad):
update t1 set t = i::text;
And now let’s add another table:
create table t2 (
id serial primary key,
t1_id integer not null references t1
);
and filling this:
insert into t2(t1_id) select id from t1;
Now the tables look like this:
mtest=> \d t1
Table "public.t1"
Column | Type | Modifiers
--------+---------+-------------------------------------------------
id | integer | not null
default nextval('t1_id_seq'::regclass)
i | integer | not null
t | text | not null
Indexes:
"t1_pkey" PRIMARY KEY, btree (id)
Referenced by:
TABLE "t2" CONSTRAINT "t2_t1_id_fkey" FOREIGN KEY (t1_id) REFERENCES t1(id)
mtest=> \d t2
Table "public.t2"
Column | Type | Modifiers
--------+---------+-------------------------------------------------
id | integer | not null
default nextval('t2_id_seq'::regclass)
t1_id | integer | not null
Indexes:
"t2_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
"t2_t1_id_fkey" FOREIGN KEY (t1_id) REFERENCES t1(id)
How many records are there?
mtest=> select count(*) from t1; count ---------- 10000000 (1 row) Time: 17340,848 ms mtest=> select count(*) from t2; count ---------- 10000000 (1 row) Time: 3522,489 ms
Let’s check the table size:
mtest=>
select pg_size_pretty(pg_relation_size('t1'));
pg_size_pretty
----------------
1204 MB
(1 row)
mtest=>
select pg_size_pretty(pg_relation_size('t2'));
pg_size_pretty
----------------
346 MB
(1 row)
But tables with indices are of course a little bit bigger::
mtest=>
select pg_size_pretty(pg_total_relation_size('t1'));
pg_size_pretty
----------------
1847 MB
(1 row)
mtest=>
select pg_size_pretty(pg_total_relation_size('t2'));
pg_size_pretty
----------------
560 MB
(1 row)
Let’s check the execution plan, both tables have 10 million of records:
explain select * from t1 left join t2 on t1.id = t2.t1_id order by t1.id desc limit 100;
QUERY PLAN
---------------------------------------------------------------------------------------------------
Limit (cost=1443640.26..1443645.31 rows=100 width=17)
-> Merge Left Join (cost=1443640.26..2558613.62 rows=22085541 width=17)
Merge Cond: (t1.id = t2.t1_id)
-> Index Scan Backward using t1_pkey on t1 (cost=0.00..814767.66 rows=22085541 width=9)
-> Materialize (cost=1443640.26..1568639.98 rows=9999977 width=8)
-> Sort (cost=1443640.26..1468640.21 rows=9999977 width=8)
Sort Key: t2.t1_id
-> Seq Scan on t2 (cost=0.00..144247.77 rows=9999977 width=8)
(8 rows)
As you can see there is that terrible sequential scan on the t2 table. That’s not good.
Let’s see how long it takes to run this query:
explain analyze select * from t1 left join t2 on t1.id = t2.t1_id order by t1.id desc limit 100;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1443640.26..1443645.31 rows=100 width=17) (actual time=34728.242..34729.371 rows=100 loops=1)
-> Merge Left Join (cost=1443640.26..2558613.62 rows=22085541 width=17) (actual time=34728.237..34729.186 rows=100 loops=1)
Merge Cond: (t1.id = t2.t1_id)
-> Index Scan Backward using t1_pkey on t1 (cost=0.00..814767.66 rows=22085541 width=9) (actual time=0.045..0.180 rows=100 loops=1)
-> Materialize (cost=1443640.26..1568639.98 rows=9999977 width=8) (actual time=34728.167..34728.653 rows=100 loops=1)
-> Sort (cost=1443640.26..1468640.21 rows=9999977 width=8) (actual time=34728.154..34728.287 rows=100 loops=1)
Sort Key: t2.t1_id
Sort Method: external merge Disk: 175872kB
-> Seq Scan on t2 (cost=0.00..144247.77 rows=9999977 width=8) (actual time=0.058..12331.185 rows=10000000 loops=1)
Total runtime: 34794.735 ms
(10 rows)
34 seconds, not so good, let’s add some index on the column t1_id from the t2 table so that the index can be used in the join condition:
create index i_t2_t1_id on t2(t1_id);
Checking the index size:
mtest=>
select pg_size_pretty(pg_total_relation_size('i_t2_t1_id'));
pg_size_pretty
----------------
214 MB
(1 row)
and the whole t2 table size:
mtest=>
select pg_size_pretty(pg_total_relation_size('t2'));
pg_size_pretty
----------------
774 MB
(1 row)
Let’s check the execution plan once again with the new index:
explain select * from t1 left join t2 on t1.id = t2.t1_id order by t1.id desc limit 100;
QUERY PLAN
---------------------------------------------------------------------------------------------------
Limit (cost=0.00..883.30 rows=100 width=17)
-> Nested Loop Left Join (cost=0.00..195082023.35 rows=22085541 width=17)
-> Index Scan Backward using t1_pkey on t1 (cost=0.00..814767.66 rows=22085541 width=9)
-> Index Scan using i_t2_t1_id on t2 (cost=0.00..8.78 rows=1 width=8)
Index Cond: (t1.id = t2.t1_id)
(5 rows)
Looks quite nice, the execution plan has not sequential scans and is using the new index.
Let’s check how the real plan for this query:
explain analyze select * from t1 left join t2 on t1.id = t2.t1_id order by t1.id desc limit 100;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..883.30 rows=100 width=17) (actual time=0.177..2.111 rows=100 loops=1)
-> Nested Loop Left Join (cost=0.00..195082023.35 rows=22085541 width=17) (actual time=0.173..1.863 rows=100 loops=1)
-> Index Scan Backward using t1_pkey on t1 (cost=0.00..814767.66 rows=22085541 width=9) (actual time=0.032..0.214 rows=100 loops=1)
-> Index Scan using i_t2_t1_id on t2 (cost=0.00..8.78 rows=1 width=8) (actual time=0.009..0.011 rows=1 loops=100)
Index Cond: (t1.id = t2.t1_id)
Total runtime: 2.352 ms
(6 rows)
Nice but this data could ba chached, let’s check some data not from the very beginning.
explain analyze select * from t1 left join t2 on t1.id = t2.t1_id order by t1.id desc limit 100 offset 100000;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=885952.01..886837.96 rows=100 width=17) (actual time=1170.478..1171.741 rows=100 loops=1)
-> Nested Loop Left Join (cost=0.00..88595200.54 rows=10000000 width=17) (actual time=0.060..1089.453 rows=100100 loops=1)
-> Index Scan Backward using t1_pkey on t1 (cost=0.00..633484.54 rows=10000000 width=9) (actual time=0.031..136.602 rows=100100 loops=1)
-> Index Scan using i_t2_t1_id on t2 (cost=0.00..8.78 rows=1 width=8) (actual time=0.004..0.005 rows=1 loops=100100)
Index Cond: (t1.id = t2.t1_id)
Total runtime: 1171.944 ms
(6 rows)
Over 1 second, that can be a lot of time but… this is only my notebook, it is not so fast, some other programs are running in the background, there is much less ram than on normal database servers, there is a slower hdd and much less memory available in the config for the database than on a normal server etc. On the production database this query can be much faster.
In reality joins on tables with millions of records are not a big problem.
Another, usually funny, argument is that the noSQL movement is stronger and stronger and more and more companies just take all the nonsql databases so that they can omit all that terrible joins. In fact joins are not that terrible, joins are quire usefull but you must have a good DBA (I think I know one
) that can improve some database internals or can create the query in a quite good way. Unfortunetally more and more programmers know that they can use the SQL database without knowing the SQL just because they use ORMs. Due to these ORM they usually turn off thinking, that’s sad.
count
———-
10000000
(1 row)
Time: 17340,848 ms
mtest=> select count(*) from t2;
count
———-
10000000
(1 row)
Time: 3522,489 ms
Related posts:
- PostgreSQL Strange Timings A very simple query. PostgreSQL 8.4. I have no idea...
- How to Allow For Max 4 Rows in Update (PostgreSQL) The Problem Question found on the net: how to restrict...
- Testing Database – Small Reply On Ovid’s blog I found lately this entry. First of...













yeah, you are right Simon.
unfortunately programmers rarely thinks about what are they doing when they are using ORM. sometimes ORM makes good job but often it is wrong. result of using it is that DBA has got a lot of funny problems :/
without join there is no life in the world of DB