[OSM-dev] API/XAPI caching proxy server

Thu Dec 16 23:39:30 GMT 2010

I'll add some background info to the discussion which may be useful.

If you do use a relational database, the latest pgsnapshot schema (ie.
pgsimple with hstore tags) for Osmosis is at least worth a look.  Note that
I still called it the simple schema in Osmosis 0.38, but I am now
differentiating between old style "pgsimple" schema and new style
"pgsnapshot" schema in the latest development release to allow use of a
separate tags table in the old style schema if desired.  The newer
hstore-based pgsnapshot schema is the only realistic one for good
performance though.

In my experience the biggest limitation in performance is disk seeking,
rather than the amount of data returned.  The earlier pgsimple schema used
to be well indexed, but due to the order in which data is created in OSM
(ie. light scatterings of changes across the globe every day), the results
will tend to be scattered across the entire disk which is disastrous for
performance.  The newer pgsnapshot schema clusters data geographically which
reduces the disk seeking considerably.  It works quite well for bounding box
style queries, but I don't know how well hstore GIST indexes will work for
tag queries.

A tile based storage approach resolves much of the disk seeking issue by
design, but you'll have to do much of the heavy lifting yourself to come up
with your own storage mechanism.  Note also that any custom storage needs
the ability to be kept up to date with minute diffs which makes it more
challenging.  This may well be the best approach, I really don't know.

If anybody wishes to experiment with Osmosis, use the latest "pgsql_simple"
scripts in the 0.38 distribution, or "pgsnapshot" scripts in Subversion.
Don't try to import a full planet initially because that will take several
days to build indexes, just try a small extract for experimentation.

To see how Osmosis does the queries, run a the command like:
osmosis -v 5 --read-pgsql authFile=myAuthFile.txt --dataset-bounding-box
left=xxxx right=xxxx top=xxxx bottom=xxxx --write-xml myresult.osm

The "-v 5" option will cause Osmosis to dump the queries as it is running
but they'll be mixed up in some very verbose output.

Also try adding the "completeWays=true" option to the --dataset-bounding-box
task which will run some additional queries to select all nodes for the area
that are used by ways.  That adds from memory approximately a 20% overhead,
but might be a little more.  In other words:
osmosis -v 5 --read-pgsql authFile=myAuthFile.txt --dataset-bounding-box
completeWays=true left=xxxx right=xxxx top=xxxx bottom=xxxx --write-xml
myresult.osm

There doesn't seem to be a completeRelations option currently although I
could have sworn I implemented it (must be getting confused with something
else), but it shouldn't be terribly difficult to implement.

I'm not able to help out at the moment, but I have implemented a /map call
replacement in the past (commercially) using standard Osmosis code
internally.  In other words, I created a simple JEE servlet which invoked
the Osmosis tasks directly.  From memory, on a relatively slow disk
sub-system it took approximately 10 minutes to retrieve a 1x1 degree area
around Munich.  London took similar amounts of time.  I don't know how this
compares to XAPI as I've never tried it myself.

Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20101217/b5d81ef2/attachment-0001.html>