[OSM-dev] PostGIS Based Bounding Box Extraction

Brett Henderson brett at bretth.com
Mon Mar 31 13:28:23 BST 2008

Hi All,

I've finished coding new bounding box extraction in osmosis.  It seems 
to work correctly although I admit I haven't tested it thoroughly yet.  
If anybody wishes to take a look or better yet improve the PostGIS usage 
I'll welcome all feedback.

I've had a number of abortive attempts at implementing a bounding box 
algorithm that fixes existing bounding box algorithm issues which is why 
it's taken me so long.  I initially wrote a file-based solution but I 
reached limits on indexing that appeared beyond my ability to fix in a 
reasonable amount of time.  I then moved onto a Berkeley DB (java 
edition) implementation but it didn't seem as scalable as I'd hoped 
either and again I was running into limitations in terms of how to 
implement indexing.  I gave up trying to provide a simple 100% command 
line driven approach and created a PostGIS solution.

I've created a relatively simple schema that is similar to the 
production MySQL schema but with a few critical differences.
* It separates node tags into their own table.
* It represents node locations as a geometry POINT type.
* It adds a geometry BBOX to the way table for the purposes of indexing 
and to allow "correct" bounding box extraction including ways with no 
nodes inside the bounding box.

The schema script is available here:

A population script (associated with the --write-pgsql-simple-dump task) 
is available here:

There are several osmosis tasks related to this bounding box functionality.
* --truncate-pgsql-simple - This task deletes all existing data from the 
* --write-pgsql-simple - This task writes data into a database.  This is 
the simplest method for importing a planet or subset thereof. (Note that 
a full planet may take a long time if not forever, I haven't tested it 
yet).  A task such as --read-xml should be fed into this task.
* --write-pgsql-simple-dump - This task writes data into files that can 
be loaded using PostgreSQL COPY commands.  The script mentioned above 
must be used to import these files because there are some additional 
steps required.  A task such as --read-xml should be fed into this 
task.  This is faster than the --write-pgsql-simple task but less 
* --read-pgsql-simple - This task exposes the database to downstream 
tasks accepting a new type of input called a "dataset".  This task 
doesn't actually read data, just exposes the database via a special 
* --dataset-dump - This task reads the entire contents of a "dataset" 
created by a "dataset" reading task (ie. --read-pgsql-simple) and feeds 
it to the next task in the pipeline.  This should be linked to something 
like --write-xml.
* --dataset-bounding-box - This task extracts all data within a bounding 
box from a "dataset" created by a "dataset" reading task (ie. 
--read-pgsql-simple) and feeds it to the next task in the pipeline.  
This should be linked to something like --write-xml.

I can't seem to reach the wiki at the moment to double check syntax but 
some sample command lines are provided below:
To import into a database:
osmosis --rx mydatafile.osm --write-pgsql-simple host=localhost 
database=osmds user=osm password=xxxx

To extract a bounding box:
osmosis --read-pgsql-simple host=localhost database=osmds user=osm 
password=xxxx --dataset-bounding-box left=-1 right=1 bottom=-1 top=1 
--write-xml mybbox.osm

The bounding box implementation is in the 
com.bretth.osmosis.core.pdb.v0_5.impl.PostgreSqlDatasetReader class in 
the iterateBoundingBox method.

Although it's java, the SQL queries should be understandable by anybody 
with PostGIS knowledge.  Note that "?" in the queries is replaced by a 
bounding box polygon using jdbc bind variables.  If running the queries 
manually, replace the "?" with something like:
'GeomFromText('POLYGON((144.93912192855174 -37.82981987499741, 
144.93912192855174 -37.79310006709244, 144.98188026000003 
-37.79310006709244, 144.98188026000003 -37.82981987499741, 
144.93912192855174 -37.82981987499741))', -1)'

It uses temporary tables which may or may not be the best approach.

If this is to become feasible for every day use I need to be able to 
apply changesets to an existing database.  This will allow a database to 
be kept up-to-date with the latest production data (with a maximum lag 
of 10 minutes or so if 1 minute diffs are used) without requiring full 
imports all the time.  Not sure when I'll get to this, it could be a 
while.  It's on my list but if somebody else wishes to do it first I'll 
be very happy.

Anyway, as I mentioned earlier any feedback would be appreciated.


More information about the dev mailing list