[OSM-dev] Incomplete diffs?

marqqs at gmx.eu marqqs at gmx.eu
Mon Nov 7 11:43:35 GMT 2011


Hello Frederik,

ok, it really must have been late. :-)
Thank you for the explanation, sounds perfect.

I wouldn't call it a bug at all because it may be necessary to keep such delete requests:

Let's say you found an out-of-date .osm file and want to update it. You guess, the file is from last Saturday 12:00 but you're not sure. Therefore you cumulate replication diffs for the time range between Saturday 10:00 (2 hours earlier) and today.

Let's further assume that a node had been created at 10:15 and was deleted at 11:45. This node would be excluded from an "ideal" simplified diff.

If the old .osm file in question in fact has the state of Saturday 11:00, it would know about the created node but never become aware of its deletion.

In the end: I'm happy about this "bug". :-)

However this doesn't make it easier to determine how much data you lose in taking the normal diffs instead of the replicated ones. But eventually I will get the answer... somehow.

Markus


-------- Original-Nachricht --------
> Datum: Mon, 07 Nov 2011 09:06:32 +0100
> Von: Frederik Ramm <frederik at remote.org>
> An: marqqs at gmx.eu
> CC: dev at openstreetmap.org
> Betreff: Re: [OSM-dev] Incomplete diffs?

> Hi,
> 
> On 11/07/2011 02:24 AM, marqqs at gmx.eu wrote:
> > # normal diff
> > $ zcat 20111103-20111104.osc.gz |grep -c "timestamp=\"2011-11-03T12:"
> > 58968
> >
> > # replication diff
> > $ cat 1103-1104.osc |grep -c "timestamp=\"2011-11-03T12:"
> > 59068
> >
> > And yes, I thought on cumulating the version in the second file before I
> started counting with grep.
> 
> I think you may have found a bug in Osmosis' --simplify-change 
> algorithm. (Or, if you created the above 1103-1104.osc file yourself, 
> you have re-implemented a bug already present in Osmosis.)
> 
> Both the normal diff and the daily diff are correct as far as I can see, 
> but the simplified version that you created - the one with 59068 
> elements - is not.
> 
> An object created earlier on that particular day and deleted between 
> 12:00 and 13:00 will not show up in the normal daily diff:
> 
> $ zgrep -A1 -B1 '<node id="1490162262"' 20111103-20111104.osc.gz
> $
> 
> It will show up twice in the replication diff, once for creation and 
> once for deletion:
> 
> $ zgrep -A1 -B1 '<node id="1490162262"' 1103-1104.osc.gz
>      <node id="1490162261" version="1" timestamp="2011-11-03T08:09:48Z" 
> uid="419929" user="hoti" changeset="9728137" lat="47.4399545" 
> lon="16.4376938"/>
>      <node id="1490162262" version="1" timestamp="2011-11-03T08:09:48Z" 
> uid="547666" user="Igor Kurvanor" changeset="9728123" lat="45.7510611" 
> lon="6.2813975"/>
>    </create>
>    <delete>
>      <node id="1490162262" version="2" timestamp="2011-11-03T12:42:36Z" 
> uid="547666" user="Igor Kurvanor" changeset="9730094" lat="45.7510611" 
> lon="6.2813975"/>
>    </delete>
> $
> 
> Now if such a replication diff is simplified with Osmosis, in my opinion 
> it should drop the node altogether, but what it does is it always keeps 
> the highest version even if that corresponds to a deletion that 
> counteracts a previous creation:
> 
> $ osmosis -q --read-xml-change 1103-1104.osc.gz --simc 
> --write-xml-change - | grep -A1 -B1 '<node id="1490162262"'
>    <delete>
>      <node id="1490162262" version="2" timestamp="2011-11-03T12:42:36Z" 
> uid="547666" user="Igor Kurvanor" changeset="9730094" lat="45.7510611" 
> lon="6.2813975"/>
>    </delete>
> $
> 
> Now this is a minor bug because I don't know any consumer that will trip 
> on a deletion request for a non-exisitng object but still it is a 
> behaviour that I would not have expected. Anyway, it should explain the 
> discrepancy you are seeing.
> 
> Bye
> Frederik



More information about the dev mailing list