[osmosis-dev] Changes to Osmosis Pgsql Schema

Sat Aug 7 12:00:07 BST 2010

Hi All,

I'm currently working on some changes to the Osmosis "simple" schema which
may be of interest to others.  I'd be interested to hear if anybody has any
major issues with this, or any better suggestions.

The current schema performs poorly, largely due to the data for typical
queries being spread across the disk.  It is well indexed, but retrieving
large numbers of rows requires huge numbers of disk seeks.  Performance
would be better if data was physically grouped according to geospatial
location.  I am planning several changes to address this:

   - CLUSTER the nodes table by the geom column index, and ways column by
   the (optional) linestring column index.  I've already tested this out for
   bbox style queries and it makes queries on these tables significantly
   quicker.  It takes a long time to perform the CLUSTER operation, but
   subsequent queries are then improved.
   - Move the tags tables into hstore tags columns on the nodes, ways and
   relations tables.  This will avoid the need to join to external tables, and
   will allow the tags data to also be clustered geospatially by the geospatial
   indexes.  For entities with large numbers of tags or large tags the data may
   be stored externally (
   http://www.postgresql.org/docs/8.4/interactive/storage-toast.html), but
   this should be the exception and most tags should fit inline in the table.
   - Create a nodes column on the ways table.  This will contain an array
   which holds only the ids of nodes that make up the way.  For typical
   bounding box style queries this will allow "completeWays" style
   functionality to be performed more efficiently without having to join to
   large numbers of rows in the way_nodes table.  For bbox style queries in
   some use cases it will also be possible to create synthetic node entities
   (without tag or user info) for missing nodes lying outside the bounding box
   which will further improve performance.

So far I've written a migration script for moving tags data into hstore
columns, and I've figured out how to get Java and JDBC playing nicely with
hstore columns.  The next step is to update existing tasks to use these
columns.  As part of this change I will also change the way the bounding box
queries work so that they store more data in the temporary tables to avoid
having to join back to the main data tables.  Again, this will significantly
reduce disk seeking.

I'll move onto the addition of a way.nodes column after I've finished the
tags changes.

I'm not sure when I'll find time to finish all of this, but it's the main
thing I'm working on.

Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/osmosis-dev/attachments/20100807/6a8785fd/attachment.html>