[Tile-serving] OSM Tile Expiration

Sat Jul 25 07:13:51 UTC 2020

Le 24/07/2020 à 22:50, Sarah Hoffmann a écrit :
> Hi,
>
> On Fri, Jul 24, 2020 at 06:02:53PM +0200, Christian Quest wrote:
>> I've also been looking recently for some documentation about the expiration
>> process on the OSMF tile servers and only found (so far) the chef cookbooks
>> as answers.
>>
>> After the osm2pgsql run, a python script takes the .osc file just processed
>> by osm2pgsql and does the expiration on the metatiles based on it and the
>> flatnodes file (to get the location of the nodes that are not in the .osc
>> file).
>>
>> My question is why doing it that way and not the expire_list /
>> render_expired way ?
>>
>> Is it faster ?
> The scripts that are used on the OSMF servers are much more light-wight
> than osm2pgsql's expiry mechanism at the price of occasionally missing
> a tile.
>
> The OSMF tile script only uses change files and the flatnode store to do
> the expiry. That means that it does not catch deletes or when something is
> moved out of a tile. (It only sees the situation after the update was
> applied.) It also does not care about relations at all. So it might miss
> a tile that is enitrely within a multipolygon and does not catch boundary
> changes that just consist of members being added and deleted.
>
> osm2pgsql's expiry catches all these cases at the price of additional
> database accesses. Also, the whole implementation is not exactly efficient.
> We have a long-standing feature request to overhaul that piece of code.

Thank you Sarah, it confirms my understanding of the python tile 
expiration code.

There is not other mecanism to re-render tiles globally ?

>> Does it invalidates metatiles in a better way ? (for example less tiles
>> invalidated for nothing)
> In theory the OSMF script invalidates less. In practise, it probably does
> not matter, so I'd go for the light-weight script version.

I'll try to compare both approach expire tile list output.

>
>> On OSM-FR server, we face some saturation on the tile regeneration process
>> and in order to find what to improve first, I'm investigating each piece of
>> the puzzle, from postgres setup, SQL queries, style sheet, cache policy,
>> expiration, etc...
>>
>> I've done a lot of statistics to understand the ratio of tiles rendered,
>> cached, requested, purgeable. By analyzing the renderd log, I found that
>> some tiles get rendered up to 25 times a day !
> I suspect that people are just busy editing. So there is probably not much
> you can do there except expiring less often.

What surprised me is that despite our renderd saturation it did render 
the same tile 25 times in one day !

Expiring less often is not that easy, re-rendering less often could be 
done by looking at the tile creation timestamp (mtime + 20 years). If 
the tile is expired but not that old, it may be considered not to be 
rendered in priority by mod_tile and provided as is to the client.

> Both expiry methods still work on raw change data. They do not take into
> account if the data that has been changed actually has an effect on the
> rendering. osm2pgsql could in theory do an expiry based on the rendering
> tables but that would require a major restructuring because it doesn't
> have a notion of data change yet. Every change is a delete+add operation.

A "perfect" expiration method would have to be based on the style sheet. 
It is complex but can be worth the effort.

I'm thinking of intrumenting renderd to compare tiles after a 
re-rendering to get statistics of re-rendered tiles that are the same as 
the expired ones to see the percentage of useless re-rendering.

-- 
Christian Quest - OpenStreetMap France