[OSM-dev] API/XAPI caching proxy server

Fri Dec 17 00:45:44 GMT 2010

Hi Brett

Thanks very much for your detailed instructions.

> In my experience the biggest limitation in performance is disk seeking,
> rather than the amount of data returned.

If that's the bottleneck (or the amount of data returned before
processing), then pl/pgsql or pl/python could help, since stored
procedures are close to the data.

If topology is an issue then perhaps the (future) topology data type
could help (http://trac.osgeo.org/postgis/wiki/UsersWikiPostgisTopology
). But I'm would test first the performance of relational (and hstore)
structures.

Finally, this sounds really like an postgres optimization task, when
you speak of several days for full planet indexing.

And finally optimization: I think this begins with the whole db
architecture: I heard about a typical architecture, where there is a
master and a slave disk: The master gets updated by the diffs, the
slave is being replicated (postgres 9.0 can do that now!) and indexed.
=> Seems to be a case for our Postgres/PostGIS gurus :->

Just my 2 cents...

Yours, S.

2010/12/17 Brett Henderson <brett at bretth.com>:
> I'll add some background info to the discussion which may be useful.
>
> If you do use a relational database, the latest pgsnapshot schema (ie.
> pgsimple with hstore tags) for Osmosis is at least worth a look.  Note that
> I still called it the simple schema in Osmosis 0.38, but I am now
> differentiating between old style "pgsimple" schema and new style
> "pgsnapshot" schema in the latest development release to allow use of a
> separate tags table in the old style schema if desired.  The newer
> hstore-based pgsnapshot schema is the only realistic one for good
> performance though.
>
> In my experience the biggest limitation in performance is disk seeking,
> rather than the amount of data returned.  The earlier pgsimple schema used
> to be well indexed, but due to the order in which data is created in OSM
> (ie. light scatterings of changes across the globe every day), the results
> will tend to be scattered across the entire disk which is disastrous for
> performance.  The newer pgsnapshot schema clusters data geographically which
> reduces the disk seeking considerably.  It works quite well for bounding box
> style queries, but I don't know how well hstore GIST indexes will work for
> tag queries.
>
> A tile based storage approach resolves much of the disk seeking issue by
> design, but you'll have to do much of the heavy lifting yourself to come up
> with your own storage mechanism.  Note also that any custom storage needs
> the ability to be kept up to date with minute diffs which makes it more
> challenging.  This may well be the best approach, I really don't know.
>
> If anybody wishes to experiment with Osmosis, use the latest "pgsql_simple"
> scripts in the 0.38 distribution, or "pgsnapshot" scripts in Subversion.
> Don't try to import a full planet initially because that will take several
> days to build indexes, just try a small extract for experimentation.
>
> To see how Osmosis does the queries, run a the command like:
> osmosis -v 5 --read-pgsql authFile=myAuthFile.txt --dataset-bounding-box
> left=xxxx right=xxxx top=xxxx bottom=xxxx --write-xml myresult.osm
>
> The "-v 5" option will cause Osmosis to dump the queries as it is running
> but they'll be mixed up in some very verbose output.
>
> Also try adding the "completeWays=true" option to the --dataset-bounding-box
> task which will run some additional queries to select all nodes for the area
> that are used by ways.  That adds from memory approximately a 20% overhead,
> but might be a little more.  In other words:
> osmosis -v 5 --read-pgsql authFile=myAuthFile.txt --dataset-bounding-box
> completeWays=true left=xxxx right=xxxx top=xxxx bottom=xxxx --write-xml
> myresult.osm
>
> There doesn't seem to be a completeRelations option currently although I
> could have sworn I implemented it (must be getting confused with something
> else), but it shouldn't be terribly difficult to implement.
>
> I'm not able to help out at the moment, but I have implemented a /map call
> replacement in the past (commercially) using standard Osmosis code
> internally.  In other words, I created a simple JEE servlet which invoked
> the Osmosis tasks directly.  From memory, on a relatively slow disk
> sub-system it took approximately 10 minutes to retrieve a 1x1 degree area
> around Munich.  London took similar amounts of time.  I don't know how this
> compares to XAPI as I've never tried it myself.
>
> Brett
>
>
> _______________________________________________
> dev mailing list
> dev at openstreetmap.org
> http://lists.openstreetmap.org/listinfo/dev
>
>