[OSM-dev] Minute Diffs Broken

Tue May 5 02:54:50 BST 2009

On Mon, 4 May 2009, Greg Troxel wrote:

> With select by time, it would still be buggy.  But if the select was
> "all changesets > X" where X was the highest changeset in the previous
> select, it would work, because there would have to be a total ordering
> of transactions (at least as far as anyone can tell).  So the select of
> highest would have to be in between two others, and the changeset id is
> perhaps an auto-sequence, or else read/increment/write which again would
> force ordering.

Even an auto sequence won't give you total ordering at select time.  The 
problem is that the sequence will give you an ordering of the insert 
statements but based on what has been committed at any points.
Consider the following

txna: INSERT ... - sequence #1
txnb: INSERT ... - sequence #2
txnb: COMMIT
txnc: osmosis polls -- it sees sequence value 2.
txna: commits
txnd: osmois polls -- for values > 2, it will miss #1.

auto-sequences are still assigned at the time the insert statement runs not 
when it commits.

A number of years ago I was wrote a replication system for postgresql 
(dbmirror) that  tried to approximate this problem by getting all the sql 
statements that were part of a single transaction (it logged them with 
triggers to a table along with the xid) and looked at the sequence # on the 
last operation performed.  More modern systems like Slony and SkyTools give 
up on trying to achive a total order and just jump from consistent snapshot 
to consistent snapshot, a much better approach, but in both you need 
triggers and tracking tables.

>
>> If we were operating on a "dirty read" basis then Brett's
>> diffs would not miss any data (but they would contain changes that
>> were part of a transaction that was later rolled back).
>
> Sure, that would be worse :-)
>
Even if you wanted to Postgresql won't allow you to read uncommitted data 
from another transaction.  I think the lowest isolation level it respects is 
READ_COMMITED.

Steve