[OSM-dev] Osmosis and Postgresql

Mon Jun 22 02:29:24 BST 2009

Graham Jones wrote:
> Brett,
>
> I'll have a look at --write-pgsql-dump and add some words about that 
> too.  Then I'll have a go at MySQL...
>
> The other thing I would like to add is a write-up on choosing a 
> database to use, because it is not obvious to me which would be the 
> best, but I suspect this has been looked at a lot before now.  I think 
> the choices are:
I'm not aware of a page summarising this, so it sounds very useful.  
Lots of people ask about it.
>
>     * PostgreSQL/PostGIS - Can handle big datasets, and has geographic
>       extensions.  What I do not know is whether these extensions are
>       fast or not, or if for simple things like selecting for a
>       bounding box it would be quicker to just select on lat/lon
>       directly.   Mapnik uses this database, but the schema is
>       optimised for rendering.
>
There's actually 3 PostgreSQL schemas which causes much confusion:
1. The API database.  This is the database behind the main OSM 
ruby-based API.  This is supported by the osmosis "apidb" tasks.  The 
only db capable of holding full history.
2. The Mapnik database.  Used by the mapnik renderer.  Loaded using 
osm2pgsql.  Not supported by osmosis.  Uses PostGIS extensions.  Holds a 
filtered subset of data appropriate for rendering.
3. The pgsql "simple" schema.  Badly named, but this is a schema I 
created for holding OSM snapshots (ie. doesn't maintain history).  It 
does support replication for keeping it up to date.  Uses PostGIS 
extensions.  Used by ROMA servers.

All three schemas above can support minutely updates, 1 and 3 by 
osmosis, 2 by osm2pgsql.
>
>    *
>
>
>     * MySQL - Can handle big datasets, but does not have geographic
>       estensions.
>
This is the same schema as the PostgreSQL API schema, but implemented in 
MySQL.  Provides identical features, but no longer the preferred 
database of choice.  It's being maintained in the short term, but I 
suspect it will get dropped over time if PostgreSQL continues to work well.
>
>     * Sqlite - I have seen a discussion about this and I think there
>       are doubts over how well it works for big datasets?
>
No idea on this one, but yes I'd be dubious about its use for large 
databases.  Not supported by osmosis.

On this note, I've also messed about with the Berkeley DB Java Edition 
which didn't work well on large datasets.  I've long since deleted the 
osmosis tasks that used it.
>
>     * GT.M - I know nothing about this, but the xapi servers seem to
>       use it?
>
I don't know anything about this either.
>
>    *
>
>
> I would also like to include some example SQL queries to deal with 
> selecting nodes / ways etc. in a given bounding box, or searching for 
> points of interest.   These are not too difficult to do, but I suspect 
> that some ways are more efficient than others, so it would be nice to 
> include 'optimum' examples.
Sounds reasonable.  You may wish to check out the osmosis "pgsql" 
queries in this space, it uses true geo-spatial queries if the optional 
way "linestring" and "bbox" columns are available.
>
> Do you know if this has already been done and written up somewhere?    
> If not I will set up a template and ask for help filling it in!
I haven't seen it before (not that I've looked ...).  Sounds great to me!

A couple of other db mechanisms are available, details are on the wiki.
TRAPI - Stores data using a tile-based mechanism. 
http://wiki.openstreetmap.org/wiki/TRAPI
osmbin - A compact custom storage mechanism.  Supported by osmosis via a 
plugin. 
http://wiki.openstreetmap.org/wiki/Osmosis/DetailedUsage#--write-osmbin-0.6

Brett