[Talk-GB] Efficient processing of map data for rendering (BOINC).
steve at nexusuk.org
Sat Jan 24 10:38:35 GMT 2009
On Fri, 23 Jan 2009, Matt Amos wrote:
> this might be helpful
Yes, I had a look at that script, but it only expires tiles with nodes on
them, which I think is rather too simplistic. The readme says that it is
unusual for the gap between nodes to be larger than a tile, but in my
experience this just isn't true at all.
So my idea was to work on the postgis objects themselves during import.
This should have some advantages:
1. We don't need to duplicate any work in translating OSM objects into the
objects that are actually rendered - osm2pgsql already does this and we
don't have to know or care how.
2. We don't need to duplicate work parsing the OSM XML file - this should
give some speed improvements.
3. There should be a reduced number of database lookups because the only
extra things we need to look up in the database are the postgis objects
that are being deleted.
The plan is to have osm2pgsql insert a list of dirty tiles for the maximum
zoom level into a postgres table. I wrote a script that goes through each
zoom level, starting at the maximum and working back to 0. Each zoom
level has a minimum age associated with it and when the tile has been
dirty for that long it is deleted and the coordinates for the tile at
zoom-1 are inserted into the table. The idea being that low-zoom tiles
change more frequently than high-zoom tiles, but are less interesting and
more effort to render so shouldn't be re-rendered immediately.
>> From my experience, the number crunching doesn't really seem to be the
>> limiting factor - database I/O is the biggest overhead for OpenPisteMap
>> (although that may be partly down to the massive amount of SRTM contours
>> data it has to handle while rendering each tile).
> this is one case where one big raid array is much better than many
> distributed disks.
I was wondering if anyone had done any tests on the speed of a database
that is distributed over a cluster of servers. I would imagine that there
would be speed improvements, but I'm not sure what the overhead is like
for actually working out which server contains the data you're after.
Another possible solution is to have a number of completely independent
rendering machines with their own copy of the database and just
round-robin the rendering requests between them. This is obviously not
something that could be done with BOINC or similar - not many people would
want to dedicate 60GB of their hard drive to the OSM postgis database. :)
But it could be done with a cluster of dedicated servers.
However, I would be really interested to see just how much load there
would be on the rendering servers if tiles were rendered on-demand only if
they hadn't been rendered before or if they have really become dirty since
the last render. It just may be that there is no need to chuck lots of
hardware at the problem if tile expiry is done well.
xmpp:steve at nexusuk.org sip:steve at nexusuk.org http://www.nexusuk.org/
Servatis a periculum, servatis a maleficum - Whisper, Evanescence
More information about the Talk-GB