[OSM-dev] Incomplete diffs?
marqqs at gmx.eu
marqqs at gmx.eu
Mon Nov 7 11:43:35 GMT 2011
Hello Frederik,
ok, it really must have been late. :-)
Thank you for the explanation, sounds perfect.
I wouldn't call it a bug at all because it may be necessary to keep such delete requests:
Let's say you found an out-of-date .osm file and want to update it. You guess, the file is from last Saturday 12:00 but you're not sure. Therefore you cumulate replication diffs for the time range between Saturday 10:00 (2 hours earlier) and today.
Let's further assume that a node had been created at 10:15 and was deleted at 11:45. This node would be excluded from an "ideal" simplified diff.
If the old .osm file in question in fact has the state of Saturday 11:00, it would know about the created node but never become aware of its deletion.
In the end: I'm happy about this "bug". :-)
However this doesn't make it easier to determine how much data you lose in taking the normal diffs instead of the replicated ones. But eventually I will get the answer... somehow.
Markus
-------- Original-Nachricht --------
> Datum: Mon, 07 Nov 2011 09:06:32 +0100
> Von: Frederik Ramm <frederik at remote.org>
> An: marqqs at gmx.eu
> CC: dev at openstreetmap.org
> Betreff: Re: [OSM-dev] Incomplete diffs?
> Hi,
>
> On 11/07/2011 02:24 AM, marqqs at gmx.eu wrote:
> > # normal diff
> > $ zcat 20111103-20111104.osc.gz |grep -c "timestamp=\"2011-11-03T12:"
> > 58968
> >
> > # replication diff
> > $ cat 1103-1104.osc |grep -c "timestamp=\"2011-11-03T12:"
> > 59068
> >
> > And yes, I thought on cumulating the version in the second file before I
> started counting with grep.
>
> I think you may have found a bug in Osmosis' --simplify-change
> algorithm. (Or, if you created the above 1103-1104.osc file yourself,
> you have re-implemented a bug already present in Osmosis.)
>
> Both the normal diff and the daily diff are correct as far as I can see,
> but the simplified version that you created - the one with 59068
> elements - is not.
>
> An object created earlier on that particular day and deleted between
> 12:00 and 13:00 will not show up in the normal daily diff:
>
> $ zgrep -A1 -B1 '<node id="1490162262"' 20111103-20111104.osc.gz
> $
>
> It will show up twice in the replication diff, once for creation and
> once for deletion:
>
> $ zgrep -A1 -B1 '<node id="1490162262"' 1103-1104.osc.gz
> <node id="1490162261" version="1" timestamp="2011-11-03T08:09:48Z"
> uid="419929" user="hoti" changeset="9728137" lat="47.4399545"
> lon="16.4376938"/>
> <node id="1490162262" version="1" timestamp="2011-11-03T08:09:48Z"
> uid="547666" user="Igor Kurvanor" changeset="9728123" lat="45.7510611"
> lon="6.2813975"/>
> </create>
> <delete>
> <node id="1490162262" version="2" timestamp="2011-11-03T12:42:36Z"
> uid="547666" user="Igor Kurvanor" changeset="9730094" lat="45.7510611"
> lon="6.2813975"/>
> </delete>
> $
>
> Now if such a replication diff is simplified with Osmosis, in my opinion
> it should drop the node altogether, but what it does is it always keeps
> the highest version even if that corresponds to a deletion that
> counteracts a previous creation:
>
> $ osmosis -q --read-xml-change 1103-1104.osc.gz --simc
> --write-xml-change - | grep -A1 -B1 '<node id="1490162262"'
> <delete>
> <node id="1490162262" version="2" timestamp="2011-11-03T12:42:36Z"
> uid="547666" user="Igor Kurvanor" changeset="9730094" lat="45.7510611"
> lon="6.2813975"/>
> </delete>
> $
>
> Now this is a minor bug because I don't know any consumer that will trip
> on a deletion request for a non-exisitng object but still it is a
> behaviour that I would not have expected. Anyway, it should explain the
> discrepancy you are seeing.
>
> Bye
> Frederik
More information about the dev
mailing list