[OSM-dev] PostGIS Based Bounding Box Extraction
brett at bretth.com
Mon Mar 31 13:28:23 BST 2008
I've finished coding new bounding box extraction in osmosis. It seems
to work correctly although I admit I haven't tested it thoroughly yet.
If anybody wishes to take a look or better yet improve the PostGIS usage
I'll welcome all feedback.
I've had a number of abortive attempts at implementing a bounding box
algorithm that fixes existing bounding box algorithm issues which is why
it's taken me so long. I initially wrote a file-based solution but I
reached limits on indexing that appeared beyond my ability to fix in a
reasonable amount of time. I then moved onto a Berkeley DB (java
edition) implementation but it didn't seem as scalable as I'd hoped
either and again I was running into limitations in terms of how to
implement indexing. I gave up trying to provide a simple 100% command
line driven approach and created a PostGIS solution.
I've created a relatively simple schema that is similar to the
production MySQL schema but with a few critical differences.
* It separates node tags into their own table.
* It represents node locations as a geometry POINT type.
* It adds a geometry BBOX to the way table for the purposes of indexing
and to allow "correct" bounding box extraction including ways with no
nodes inside the bounding box.
The schema script is available here:
A population script (associated with the --write-pgsql-simple-dump task)
is available here:
There are several osmosis tasks related to this bounding box functionality.
* --truncate-pgsql-simple - This task deletes all existing data from the
* --write-pgsql-simple - This task writes data into a database. This is
the simplest method for importing a planet or subset thereof. (Note that
a full planet may take a long time if not forever, I haven't tested it
yet). A task such as --read-xml should be fed into this task.
* --write-pgsql-simple-dump - This task writes data into files that can
be loaded using PostgreSQL COPY commands. The script mentioned above
must be used to import these files because there are some additional
steps required. A task such as --read-xml should be fed into this
task. This is faster than the --write-pgsql-simple task but less
* --read-pgsql-simple - This task exposes the database to downstream
tasks accepting a new type of input called a "dataset". This task
doesn't actually read data, just exposes the database via a special
* --dataset-dump - This task reads the entire contents of a "dataset"
created by a "dataset" reading task (ie. --read-pgsql-simple) and feeds
it to the next task in the pipeline. This should be linked to something
* --dataset-bounding-box - This task extracts all data within a bounding
box from a "dataset" created by a "dataset" reading task (ie.
--read-pgsql-simple) and feeds it to the next task in the pipeline.
This should be linked to something like --write-xml.
I can't seem to reach the wiki at the moment to double check syntax but
some sample command lines are provided below:
To import into a database:
osmosis --rx mydatafile.osm --write-pgsql-simple host=localhost
database=osmds user=osm password=xxxx
To extract a bounding box:
osmosis --read-pgsql-simple host=localhost database=osmds user=osm
password=xxxx --dataset-bounding-box left=-1 right=1 bottom=-1 top=1
BOUNDING BOX IMPLEMENTATION
The bounding box implementation is in the
com.bretth.osmosis.core.pdb.v0_5.impl.PostgreSqlDatasetReader class in
the iterateBoundingBox method.
Although it's java, the SQL queries should be understandable by anybody
with PostGIS knowledge. Note that "?" in the queries is replaced by a
bounding box polygon using jdbc bind variables. If running the queries
manually, replace the "?" with something like:
144.93912192855174 -37.79310006709244, 144.98188026000003
-37.79310006709244, 144.98188026000003 -37.82981987499741,
144.93912192855174 -37.82981987499741))', -1)'
It uses temporary tables which may or may not be the best approach.
If this is to become feasible for every day use I need to be able to
apply changesets to an existing database. This will allow a database to
be kept up-to-date with the latest production data (with a maximum lag
of 10 minutes or so if 1 minute diffs are used) without requiring full
imports all the time. Not sure when I'll get to this, it could be a
while. It's on my list but if somebody else wishes to do it first I'll
be very happy.
Anyway, as I mentioned earlier any feedback would be appreciated.
More information about the dev