[OSM-dev] Minute Diffs Broken

Brett Henderson brett at bretth.com
Tue May 5 00:19:55 BST 2009

Frederik Ramm wrote:
> Hi,
> Brett Henderson wrote:
>> Unfortunately the minute diffs appear to be regularly missing data.  
>> In the last 8 hours at least 3 changesets have been missed.  The ones 
>> I've noticed are 1076325, 1076998, 1077469.  These have been detected 
>> by comparing the normal minute diffs against another minute diff 
>> process running half an hour later. 
> Can you elaborate a bit? I don't quite understand what you mean by 
> changesets that have been missed. What exactly are you doing, and in 
> what way do the results look wrong?
Okay, that probably wasn't clear.  Osmosis doesn't even look at 
changesets at the moment.  So when I talk about changesets I'm not 
specifically referring to any of the data in the changeset table.  The 
only thing I look at is the changeset_id column on entities which 
results in the "changeset" attribute.

What I've noticed is that when I run minute diffs 5 minutes behind the 
API, data is missed compared to minute diffs running 30 minutes behind 
the API.  Each time data is missing I've noticed that it belongs to a 
single large changeset.  Presumably this is because the large changesets 
sometimes take longer than 5 minutes to process (seems awfully slow but 
that's what I'm seeing) and therefore the database transaction is not 
committed until 5 minutes after the start of processing.  This 5 minute 
delay means that by the time data is committed and becomes visible to 
osmosis querying the history table, osmosis has moved past the time 
window containing that data and the changes are missed.
> - Are you sure that we're all on the same page regarding the meaning 
> of changeset columns in the database, especially that the "closed_at" 
> date is only fixed once it is in the past - as long as "closed_at" is 
> in the future, it can still move forward or backward in time. (I'm not 
> even sure I am right on this one but I trust I'll be told by someone 
> if not ;-)
I'm not reading any of the changeset table data so the behaviour of the 
closed_at field doesn't affect osmosis.  The changeset table is 
effectively useless to osmosis processing because changesets aren't 
atomic.  At some point I'd like to replicate its contents but I will 
have to trigger that off the timestamps on the entities within it 
(meaning the changeset metadata may be replicated several times) to get 
accurate results.

I hope that answers your questions.


More information about the dev mailing list