[OSM-dev] Minute Diffs Broken

Brett Henderson brett at bretth.com
Tue May 5 01:00:40 BST 2009

Brett Henderson wrote:
> Frederik Ramm wrote:
>> Hi,
>> Brett Henderson wrote:
>>> Unfortunately the minute diffs appear to be regularly missing data.  
>>> In the last 8 hours at least 3 changesets have been missed.  The 
>>> ones I've noticed are 1076325, 1076998, 1077469.  These have been 
>>> detected by comparing the normal minute diffs against another minute 
>>> diff process running half an hour later. 
>> Can you elaborate a bit? I don't quite understand what you mean by 
>> changesets that have been missed. What exactly are you doing, and in 
>> what way do the results look wrong?
> Okay, that probably wasn't clear.  Osmosis doesn't even look at 
> changesets at the moment.  So when I talk about changesets I'm not 
> specifically referring to any of the data in the changeset table.  The 
> only thing I look at is the changeset_id column on entities which 
> results in the "changeset" attribute.
> What I've noticed is that when I run minute diffs 5 minutes behind the 
> API, data is missed compared to minute diffs running 30 minutes behind 
> the API.  Each time data is missing I've noticed that it belongs to a 
> single large changeset.  Presumably this is because the large 
> changesets sometimes take longer than 5 minutes to process (seems 
> awfully slow but that's what I'm seeing) and therefore the database 
> transaction is not committed until 5 minutes after the start of 
> processing.  This 5 minute delay means that by the time data is 
> committed and becomes visible to osmosis querying the history table, 
> osmosis has moved past the time window containing that data and the 
> changes are missed.
The way osmosis identifies changed records is by query the history table 
for entities with a timestamp within a time interval.  The time interval 
will be an hour long for hourly diffs, a minute long for minute diffs.

For example, the node query is:
SELECT e.id, e.version, e.timestamp, e.visible, u.data_public,
u.id AS user_id, u.display_name, e.changeset_id, e.latitude, e.longitude 
FROM nodes e
LEFT OUTER JOIN changesets c ON e.changeset_id = c.id LEFT OUTER JOIN 
users u ON c.user_id = u.id
WHERE e.timestamp > ? AND e.timestamp <= ? ORDER BY e.id, e.version

If the history table records don't exist (or aren't committed) when this 
query runs, the records won't be put into the diff file.

More information about the dev mailing list