Beginning with PostgreSQL 7.1 it has been possible
to control the query planner to some extent by using the explicit JOIN
syntax. To see why this matters, we first need some background.
In a simple join query, such as
SELECT * FROM a, b, c WHERE a.id = b.id AND b.ref = c.id;
the planner is free to join the given tables in any order. For
example, it could generate a query plan that joins A to B, using
the WHERE condition a.id = b.id, and then
joins C to this joined table, using the other WHERE
condition. Or it could join B to C and then join A to that result.
Or it could join A to C and then join them with B --- but that
would be inefficient, since the full Cartesian product of A and C
would have to be formed, there being no applicable condition in the
WHERE clause to allow optimization of the join. (All
joins in the PostgreSQL executor happen
between two input tables, so it's necessary to build up the result
in one or another of these fashions.) The important point is that
these different join possibilities give semantically equivalent
results but may have hugely different execution costs. Therefore,
the planner will explore all of them to try to find the most
efficient query plan.
When a query only involves two or three tables, there aren't many join
orders to worry about. But the number of possible join orders grows
exponentially as the number of tables expands. Beyond ten or so input
tables it's no longer practical to do an exhaustive search of all the
possibilities, and even for six or seven tables planning may take an
annoyingly long time. When there are too many input tables, the
PostgreSQL planner will switch from exhaustive
search to a genetic probabilistic search
through a limited number of possibilities. (The switch-over threshold is
set by the GEQO_THRESHOLD run-time
parameter described in the PostgreSQL 7.3 Administrator's Guide.)
The genetic search takes less time, but it won't
necessarily find the best possible plan.
When the query involves outer joins, the planner has much less freedom
than it does for plain (inner) joins. For example, consider
SELECT * FROM a LEFT JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id);
Although this query's restrictions are superficially similar to the
previous example, the semantics are different because a row must be
emitted for each row of A that has no matching row in the join of B and C.
Therefore the planner has no choice of join order here: it must join
B to C and then join A to that result. Accordingly, this query takes
less time to plan than the previous query.
The PostgreSQL query planner treats all
explicit JOIN syntaxes as constraining the join order, even though
it is not logically necessary to make such a constraint for inner
joins. Therefore, although all of these queries give the same result:
SELECT * FROM a, b, c WHERE a.id = b.id AND b.ref = c.id;
SELECT * FROM a CROSS JOIN b CROSS JOIN c WHERE a.id = b.id AND b.ref = c.id;
SELECT * FROM a JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id);
but the second and third take less time to plan than the first. This effect
is not worth worrying about for only three tables, but it can be a
lifesaver with many tables.
You do not need to constrain the join order completely in order to
cut search time, because it's OK to use JOIN operators in a plain
FROM list. For example,
SELECT * FROM a CROSS JOIN b, c, d, e WHERE ...;
forces the planner to join A to B before joining them to other tables,
but doesn't constrain its choices otherwise. In this example, the
number of possible join orders is reduced by a factor of 5.
If you have a mix of outer and inner joins in a complex query, you
might not want to constrain the planner's search for a good ordering
of inner joins inside an outer join. You can't do that directly in the
JOIN syntax, but you can get around the syntactic limitation by using
subselects. For example,
SELECT * FROM d LEFT JOIN
(SELECT * FROM a, b, c WHERE ...) AS ss
ON (...);
Here, joining D must be the last step in the query plan, but the
planner is free to consider various join orders for A, B, C.
Constraining the planner's search in this way is a useful technique
both for reducing planning time and for directing the planner to a
good query plan. If the planner chooses a bad join order by default,
you can force it to choose a better order via JOIN syntax --- assuming
that you know of a better order, that is. Experimentation is recommended.