[osmosis-dev] nodes column in way table and way_nodes table

Thu Dec 22 10:40:11 GMT 2011

Hi Michel,

Yes, there is redundant data stored in the pgsnapshot schema.  It's there
for performance reasons.  If you compare the pgsnapshot schema to the
earlier pgsimple schema you'll notice that the way.nodes column is new, but
that the way_nodes table has always existed.  Both way->node mechanisms are
used for different purposes.

The pgsnapshot schema relies heavily on the PostgreSQL CLUSTER feature
where table contents can be sorted to align with one of the indexes.  The
nodes and ways tables are both organised by geographical location which
drastically improves performance when performing bounding box style queries
because data in the same geographical area will be stored close together on
disk.  The way_nodes table can't make use of geographical clustering which
means that joining the ways table to the nodes table via way_nodes is very
slow for large amounts of data due to large amounts of disk seeks.
Therefore the way->node information is duplicated inside the ways table for
fast lookup of nodes within ways.  This schema includes tags directly
against the node, way and relation tables in hstore columns for exactly the
same reason.  On typical SATA disks, the pgsnapshot schema performs
approximately 10 times faster than the pgsimple schema for bounding box
queries.

I can't drop the way_node table because it is still needed for two reasons
(may be others I've forgotten).  1. It is used for replication when a node
changes and the related way geometries have to be updated. 2. It is used
for bounding box queries if the ways table doesn't have any geometric
columns configured.

I hope that makes it clearer.
Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/osmosis-dev/attachments/20111222/fd8bcd2e/attachment.html>