[OSM-dev] osm2pgsql tile expiry freaks me out

Thu Nov 5 12:04:16 GMT 2009

On Thu, Nov 5, 2009 at 1:23 AM, Frederik Ramm <frederik at remote.org> wrote:
> Jon Burgess wrote:
>> I have discussed the expiry scripts with Matt a couple of times and he
>> found that the osm2pgsql based approach tended to hit the DB quite hard.
>
> I haven't measured this thoroughly, but on the machine where I do
> updates every 15 minutes, the updates normally took around 150 seconds,
> and with expiry switched on it's more like 250 seconds or so.

when i tried using the osm2pgsql expiry on the minutely no-names tile
server it slowly went out of sync. after about a week of running it
was 2 days behind. admittedly, that EC2 server it's running on doesn't
exactly have stellar disk performance.

>> He found that even though the osm2pgsql code should in theory produce
>> more accurate results, the ruby scripts tended to work better overall.
>
> Matt's scripts don't do relations and this is probably the reason why
> they work well. My experiments have shown the following (for changes
> covering a three-hour interval):

yeah, the assumption is that any change to a relation is going to
(probably) involve changes to one or more nodes and ways, so the
relation (probably) doesn't need expiring in total. of course, there
are cases where it is necessary, but the scripts are only intended as
a quick-and-dirty approximation to proper expiry.

> So there are relations, especially boundary relations, where a little
> change to the relation expires a couple million level-18 tiles. (The
> largest way, #35421140, a riverbank, expires half a million.)

is it expiring the bbox of that way, or just the tiles touching the boundary?

> I suspect that at least the large results for relations are due to an
> inefficiency; probably the whole circumference of the relation is marked
> dirty if a little bit changes here or there, something which would not
> be necessary. (In theory, of course, a rendering rule depending on the
> polygon area could flick over and render a whole country pink instead of
> gray just because its area has changed minimally...)
>
> Also, if the geometry of a way changes (and not its tags), then I could
> probably compare the new geometry to the old one and expire only where
> they differ - at least if expiring the whole length of the way means
> half a million tiles or so.

indeed. and there are operations, such as reversing the way, which
might not change the rendering at all. i've been working on something
that uses the method you describe to track "real" changes by using the
diff/patch algorithm to find insertions, deletions and changes to the
way_nodes and relation_members. it's then much easier to expire the
real changes - assuming, of course, that your area-based colour change
rules are absent ;-)

> But as for tagging changes, we're quickly getting into terrain where
> expiry and render rules intermingle; if someone changes the "source" tag
> on a very large polygon way, do I really need to expire half a million
> tiles? But what if the same way's landuse tag is changed? It is probably
> a bug or an inefficiency that we have such a high number of expired
> tiles at the moment, but even with perfectly functioning software of
> course it would be possible that e.g. a large boundary gets a new
> admin_level or so and expiry of a very large number of tiles is actually
> required...

indeed. it's a complex problem. there's a quick-and-dirty solution,
but to do it properly, efficiently and accurately is very hard. i
think jon was saying in the pub last night that diff updates and
expiry already take up more resources than rendering tiles on yevaud.
and that's with the quick-and-dirty solution.

cheers,

matt