[OSM-dev] Minute Diffs Broken
Brett Henderson
brett at bretth.com
Tue May 5 00:19:55 BST 2009
Frederik Ramm wrote:
> Hi,
>
> Brett Henderson wrote:
>> Unfortunately the minute diffs appear to be regularly missing data.
>> In the last 8 hours at least 3 changesets have been missed. The ones
>> I've noticed are 1076325, 1076998, 1077469. These have been detected
>> by comparing the normal minute diffs against another minute diff
>> process running half an hour later.
>
> Can you elaborate a bit? I don't quite understand what you mean by
> changesets that have been missed. What exactly are you doing, and in
> what way do the results look wrong?
Okay, that probably wasn't clear. Osmosis doesn't even look at
changesets at the moment. So when I talk about changesets I'm not
specifically referring to any of the data in the changeset table. The
only thing I look at is the changeset_id column on entities which
results in the "changeset" attribute.
What I've noticed is that when I run minute diffs 5 minutes behind the
API, data is missed compared to minute diffs running 30 minutes behind
the API. Each time data is missing I've noticed that it belongs to a
single large changeset. Presumably this is because the large changesets
sometimes take longer than 5 minutes to process (seems awfully slow but
that's what I'm seeing) and therefore the database transaction is not
committed until 5 minutes after the start of processing. This 5 minute
delay means that by the time data is committed and becomes visible to
osmosis querying the history table, osmosis has moved past the time
window containing that data and the changes are missed.
>
> - Are you sure that we're all on the same page regarding the meaning
> of changeset columns in the database, especially that the "closed_at"
> date is only fixed once it is in the past - as long as "closed_at" is
> in the future, it can still move forward or backward in time. (I'm not
> even sure I am right on this one but I trust I'll be told by someone
> if not ;-)
I'm not reading any of the changeset table data so the behaviour of the
closed_at field doesn't affect osmosis. The changeset table is
effectively useless to osmosis processing because changesets aren't
atomic. At some point I'd like to replicate its contents but I will
have to trigger that off the timestamps on the entities within it
(meaning the changeset metadata may be replicated several times) to get
accurate results.
I hope that answers your questions.
Brett
More information about the dev
mailing list