[OSM-dev] Slippy Map, automatic Tile Rendering

Thu Jan 18 16:54:24 GMT 2007

On Thu, Jan 18, 2007 at 03:17:38PM +0000, Robert (Jamie) Munro wrote:
> Jochen Topf wrote:
> > On Thu, Jan 18, 2007 at 11:40:50AM +0100, Frederik Ramm wrote:
> >> (a) Which tiles are outdated? One could of course just re-generate  
> >> all existing tiles but that would be very time-consuming (I guess  
> >> about 20 days of round-the-clock computing for a modern PC; less of  
> >> course if many take part in the distributed computing). I will try to  
> >> evaluate the last-change-timestamp in the planet file and use that to  
> >> determine which tiles are out of date.
> >>
> >> (b) If I want to regularly find outdated tiles, the perfect way would  
> >> of course be to process a full change log instead of the limited RSS  
> >> feed. Ideally, I would want to ask the server "please send all nodes  
> >> changed or added since <timestamp>", then use this regularly to re- 
> >> generate all affected tiles. Does someone know if a feature like this  
> >> is planned or even likely? - The other approach is,  again, using the  
> >> weekly planet files to find out which nodes have changed. But this  
> >> would lead to weekly updates only.
> >>
> >> I am not saying that we should abandon the current tile regeneration  
> >> based on the RSS feed, but it seems to me that it needs to be  
> >> supplemented by something else if it alone cannot guarantee that our  
> >> tiles are halfway current.
> > 
> > How about this: For every update the API could increment a version counter
> > for the tile this update is in. So you have an every-increasing counter
> > for every lowest-level tile. Whenever you render a tile you store the
> > current counter for this tile. It is now very easy to find out how many
> > changes there were between the time you rendered the tile and the
> > current counter. The details on how to handle higher-level tiles have to
> > be worked out of course.
> 
> This sounds like a bad idea because you would have to perform a huge
> number of queries, one for each tile (is tile[n].current version >
> tile[n].last version). Why not just use last update timestamp or a
> globally incrementing counter, then you need only ask "Send me all tiles
> with timestamp > [last time I checked]" which is really simple for the
> DB to answer (as long as you index the field).

Thats not the question I am asking. I am trying to find out, which tile
to render next. For that I want to know which one has changed and which one
didn't. And not only that, I want to know which one changed the most,
because I want to render that first.

But you are right, that would mean I have to go through all tiles, calculate
the difference between the tile rendering count and the current count.
So thats not really efficient, even if we have everything in memory. So
we have to have some more ideas here.

> > The API would need some code to calculate the tile number from the
> > lat/lon coordinates. The biggest problem would probably be that for
> > changes in segments and ways you have to get the participating nodes first
> > and find their coordinates and then tile numbers. But with some clever
> > optimization this could be fast enough. We don't have to check both ends
> > of a segment for instance, because generally they are very close. So it
> > wouldn't really matter in most cases.
> > 
> > The benefit of this approach is that it is totally generic. Every
> > renderer can use the same counter infrastructure to find outdated tiles.
> 
> Only if the different renderers always use the same tiles. Currently,
> they don't - e.g. there are renderers that use a mercator projection,
> and others that use a simple rectangular latitude and longitude for use
> in products like Google Earth or WorldWind.

Ok, I didn't know that. We'll either have to standardise on a tile
layout if that is feasable, do the whole thing several times over or it
doesn't work. But we were talking about the slippy map here. Do the
slippy maps use different tile layouts?

> > This is important because we'll get more and more renderers and special
> > maps and layers. Also it doesn't use the main database at all, the
> > counters can be kept somewhere separate, maybe even in memory. So it has
> > no performance impact on our most critical asset. Compare this to any
> > kind of "diff" approach where the database has to do all the work.
> 
> The database is there to do work - that is what it is for. Storing
> things in memory only reduces the amount of memory the database has to
> use for doing the work itself. If you want to avoid loading the

I think we have to do everything to reduce load on the database.
Whatever we do, the database will always be our bottle neck, because of
the huge number of writes which have to go through a single point.
(At least as long as we can't split up the database into smaller chunks
based on geography or something like it, which opens up loads of other
problems so we want to avoid it as long as we can.)

I am trying to invent ideas to prolong the useful life of the database.
In case you haven't noticed it is slow quite often (although better in
the last days). But this is getting worse every day and it should,
because it means the project is getting more popular.

Of course this counter thing would not run on the same machine as the
database, so it wouldn't take away any memory from the database.

> database, use a mirror of the database on another machine (using the
> tools to load planet.osm into a new database). People here always seem
> to write scripts that duplicate huge chunks of the database into memory
> e.g.:
> script to measure the length of all the segments:
> http://svn.openstreetmap.org/utils/osm-length
> vs. my sql query that does the same thing:
> http://lists.openstreetmap.org/pipermail/dev/2007-January/002861.html

Of course it would be better to do this in the database, but I don't have
access to the database. And most people here haven't. We can only access
it through the API. So thats what we use. Its a stopgap. I am not happy
with it, but thats what I can work with at the moment.

Generally you are right: Why re-invent the wheel, when the database can
do the work for you? But sometimes special-purpose databases/datastructures
are so much faster that it makes sense to roll your own. The OSM
database is a case in point, because it doesn't use the PostGIS approach
for its geo-data, but does its own. So I think we have to keep an open
mind and consider solutions in the database as well as solutions outside
it. Because all database access is through the API we have an excellent
point where we can hook in all sorts of other things.

Jochen
-- 
Jochen Topf  jochen at remote.org  http://www.remote.org/jochen/  +49-721-388298