[OSM-dev] Diff issues: minutely overwritten by later data/incompl. daily diffs

mmd mmd.osm at gmail.com
Mon Aug 15 17:20:27 UTC 2016

Am 14.08.2016 um 12:31 schrieb Paul Norman:
> On 8/14/2016 2:56 AM, mmd wrote:
>> Processing daily diffs is in fact a very convenient to load 4 years
>> worth of OSM data (needed for full history), rather than downloading
>> more than 2 Mio. minutely diffs.
> I don't know about the data issues, but I suggest you use the full
> history. Over half the data in OSM is covered by those diffs. The full
> history is available as a faster to parse PBF format, and an initial
> import is generally faster than consuming diffs.

I've been trying to populate Overpass API with full history data the
last 2 years without much success.

Part of the issue is, that I cannot simply convert the whole osh.pbf
file into a single change file. It would require an estimated 8 TB of
main memory.

I recently found out that splitting the large full history into daily
".osh.pbf" chunks and then recreating daily change files via libosmium
looks like the way to go to overcome memory constraints.

Also, there's those data inconsistencies in the early OSM days (see
below for examples) and missing objects due to redaction, both of which
have quite a fatal effect on the database update processing. I still
don't have an idea how to deal with those cases.

In the usual workflow, e.g. install first ODbl planet from Sep 2012 and
apply subsequent diffs, neither redaction issues nor data
inconsistencies appeared so far. There were some very infrequent diff
file glitches in the past, which got resolved very quickly, though.

I'm a bit unsure why daily diffs don't always match their minutely diff
counterpart as mentioned before.

Example 1:

way  8464411, v1: 2007-06-25T15:15:37Z, references node 139534
node  139534, v1: 2007-11-03T18:05:49Z


Example 2:

Similar case: way references not yet existing node.



More information about the dev mailing list