[OSM-dev] Proposal: Database accelerator and dirty tile marker based on simple algorithm.

Thu Sep 21 09:43:09 BST 2006

Hello Lars

Lars Aronsson wrote:
> What I see here is urban legends, myths and guesswork.  Yes, you 
> made some changes to the SQL, for example on August 31.  But what 
> was the load before that and how did it change afterwards?  I see 
> no statistics at all on how many requests are served.  So how can 
> you say that "demand" has increased?  My guess is that since the 
> system appears to be slower, you guess that demand has increased. 
> Since I don't like to guess, I'm asking: What do you base this 
> statement on?

The rate of growth in dataset is a pretty good indication that the 
number of requests served accelerated and functionality of the system 
has lowered bottlenecks since installing the new systems.

When the system was transferred, the dataset was around 4Gb. Today it 
stands at around 13Gb. Nearly 1.3Gb has been shaved off the GPS point 
database through efficiency measures, making the total today the 
equivalent of 14.3Gb. Grown by 3.5 times in 4.5 months.

> 
> Now the tile server works so-so on some days and is completely 
> dead on other days, but the reason is a complete unknown.  All I 
> know for sure is that *I* am not trusted with a login account on 
> the tile server, so *I* cannot look in the log files what's going 
> on.  I'm strapped to the backseat, so you can blame me for being a 
> backseat driver all you want.  It's quite frustrating.

We do have data from munin. 
http://www.openstreetmap.org/munin/openstreetmap/tile.openstreetmap.html
Any data needs to be analysed and interpreted. From experience, I can 
interpret those graphs with a fair degree of confidence.

> 
>> Looking for method changes which can improve performance 
>> possibly by another order of magnitude is really what we need.
> 
> No, what we need is measurement.  How many calls are served, how 
> much time does each call take, and what part of the call is taking 
> time?  Without measurement, every change is a guesswork.

Measuring calls on tile would not yield useful results because you will 
not be measuring the call you are making in terms of processing load. 
You will instead be measuring the background system load. We already 
have that data. If you are a scholar of science, you will know that we 
need controlled experiments. Experiments which isolate external effects.

If you would like to implement the tile server rendering functionality 
at home against the planet data set, it should be pretty obvious where 
the bottlenecks are.

I would welcome your suggestions how to improve the performance of tile 
or submit optimised code.