[Talk-GB] Efficient processing of map data for rendering (BOINC).

Steve Hill steve at nexusuk.org
Sat Jan 24 10:38:35 GMT 2009


On Fri, 23 Jan 2009, Matt Amos wrote:

> this might be helpful
> http://svn.openstreetmap.org/applications/utils/export/tile_expiry/

Yes, I had a look at that script, but it only expires tiles with nodes on 
them, which I think is rather too simplistic.  The readme says that it is 
unusual for the gap between nodes to be larger than a tile, but in my 
experience this just isn't true at all.

So my idea was to work on the postgis objects themselves during import. 
This should have some advantages:
1. We don't need to duplicate any work in translating OSM objects into the 
objects that are actually rendered - osm2pgsql already does this and we 
don't have to know or care how.
2. We don't need to duplicate work parsing the OSM XML file - this should 
give some speed improvements.
3. There should be a reduced number of database lookups because the only 
extra things we need to look up in the database are the postgis objects 
that are being deleted.

The plan is to have osm2pgsql insert a list of dirty tiles for the maximum 
zoom level into a postgres table.  I wrote a script that goes through each 
zoom level, starting at the maximum and working back to 0.  Each zoom 
level has a minimum age associated with it and when the tile has been 
dirty for that long it is deleted and the coordinates for the tile at 
zoom-1 are inserted into the table.  The idea being that low-zoom tiles 
change more frequently than high-zoom tiles, but are less interesting and 
more effort to render so shouldn't be re-rendered immediately.

>> From my experience, the number crunching doesn't really seem to be the
>> limiting factor - database I/O is the biggest overhead for OpenPisteMap
>> (although that may be partly down to the massive amount of SRTM contours
>> data it has to handle while rendering each tile).
>
> +1
>
> this is one case where one big raid array is much better than many
> distributed disks.

I was wondering if anyone had done any tests on the speed of a database 
that is distributed over a cluster of servers.  I would imagine that there 
would be speed improvements, but I'm not sure what the overhead is like 
for actually working out which server contains the data you're after.

Another possible solution is to have a number of completely independent 
rendering machines with their own copy of the database and just 
round-robin the rendering requests between them.  This is obviously not 
something that could be done with BOINC or similar - not many people would 
want to dedicate 60GB of their hard drive to the OSM postgis database. :) 
But it could be done with a cluster of dedicated servers.

However, I would be really interested to see just how much load there 
would be on the rendering servers if tiles were rendered on-demand only if 
they hadn't been rendered before or if they have really become dirty since 
the last render.  It just may be that there is no need to chuck lots of 
hardware at the problem if tile expiry is done well.

  - Steve
    xmpp:steve at nexusuk.org   sip:steve at nexusuk.org   http://www.nexusuk.org/

      Servatis a periculum, servatis a maleficum - Whisper, Evanescence





More information about the Talk-GB mailing list