[OSM-dev] Extended diffs

Komяpa me at komzpa.net
Thu May 3 12:43:47 BST 2012


Hello!

After some time of thinking what's needed in order to keep osm2pgsql
database up-to-date, I've come to idea of extended diffs.

In usual OSM diffs, there are only objects that were changed. If a
node was dragged, you'll get just that node, and will have to look up
that node somewhere in your cache to check if it's in some way, and if
that way is in some relation, and for that relation fetch all the
member ways, and all the nodes of all the ways.

That often leads to broken multipolygons or way geometries (if some
nones or ways went off your local database when osmosis/osm2pgsql
failed to apply some diff for some reason), and basically happens on
every machine in the world that tries to reconstruct geometry from OSM
diffs.

That stuff also takes time, on slow consumer machines (a.k.a. "home
servers") it may even take more time to apply a diff than a length of
a period for that diff. A minute diff that applies for almost a minute
is quite common.

To make life easier for such consumers, I think it's worth
implementing "extended diffs". These can be implemented as a small
(osm.pbf? osm.bz2?) file coupled with a diff, that will keep all the
stuff that was changed due to applying that diff. Diff itself can be
reduced to a small list of things that were deleted.

Cons: higher load on OSM servers to select stuff, larger traffic.
Pros: all the osm2pgsql databases can be halved in size (slim tables
can be just dropped from these for most applications), diffs can be
applied blazingly fast on any hardware then (DELETE all the id's that
were changed, COPY all the new/touched stuff - no need to query
database for nodes, ways and relations).

Comments? Ideas? Patches?

-- 
Darafei "Komяpa" Praliaskouski
OSM BY Team - http://openstreetmap.by/
xmpp:me at komzpa.net mailto:me at komzpa.net



More information about the dev mailing list