[OSM-dev] Minute Diffs Broken

Greg Troxel gdt at ir.bbn.com
Wed May 6 00:06:41 BST 2009

Brett Henderson <brett at bretth.com> writes:

>> Given the way the world is, it seems like the minute diffs really should
>> be looking for new transactions, not new changesets.  I can see
>> Frederik's point of only exporting closed changesets, but for that to
>> really make sense I think the main database has to isolate changesets
>> From each other until they are fully committed (meaning either
>> long-running transactions or an API change to have an API operation be
>> open/upload/close) -- trying to add transaction properties on a copy
>> when they aren't there in the original seems like it just won't work.

> Just to be clear, osmosis isn't looking for new changesets or
> transactions, it is just looking for entities that have been modified
> within a specific time period.  It doesn't know what an API changeset
> or database transaction is.  Perhaps it should be looking for
> transactions (although I don't see how that will solve anything yet)
> but that is not currently the case.

Given the use of pgsql transactions, osmosis won't see data from
uncommitted transactions.  So I really meant "changes in the database,
subject to the notion that uncommitted transactions won't be visible."

> I'm still against the idea of minute diffs being a "collection of
> changesets".  The "collection of uploads" is closer to the mark,
> although uploads are just an API convenience, they have no
> representation in the database and have no meaning to osmosis.  minute
> diffs are really a minimal diff to get from one point in time to
> another.

I agree.  Having looked at the schema, I don't see how osmosis can
extract a diff (meaning some data that can be replayed into a copy of
the database) without support for what is essentially a journal.  Just
looking at nodes, the nodes table doesn't seem to quite be a journal for
the current_nodes table.

> To complicate things slightly further, the full history files
> http://planet.openstreetmap.org/history/
> are similar but complete a full delta from one point in time to
> another and may contain several versions of a single entity.
> So perhaps the term "diffs" is the right one for the existing files
> and "deltas" is the right one for full history files.

I would hope that both have the property that if a copy of the DB that
was right at the earlier time, then applying delta or diff to that copy
gets one a copy of the database as of when the osmosis extract
transaction ran.  Perhaps then the delta has the intermediate steps and
the diff is permitted to collapse them?

> The reason I've tended to avoid the word "diffs" is because the planet
> directory also contains diffs between planet files.  These diffs are
> yet another way of describing changes/differences and are truly a
> difference between two planet files.

As in the output of the diff command on two text files which happen to
contain xml, it sounds like.

> If you're not familiar with it already, please check out the API
> schema.  If information isn't stored there, we can't query it.  For
> example, there is no concept of an upload in the database, the only
> grouping feature it has is changesets.
> http://gweb.bretth.com/apidb06-pgsql-latest.sql

Thanks - read and sort of understood - there's a lot in there.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 193 bytes
Desc: not available
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20090505/6d49f227/attachment.pgp>

More information about the dev mailing list