[OSM-dev] Minute Diffs Broken
gdt at ir.bbn.com
Wed May 6 00:06:41 BST 2009
Brett Henderson <brett at bretth.com> writes:
>> Given the way the world is, it seems like the minute diffs really should
>> be looking for new transactions, not new changesets. I can see
>> Frederik's point of only exporting closed changesets, but for that to
>> really make sense I think the main database has to isolate changesets
>> From each other until they are fully committed (meaning either
>> long-running transactions or an API change to have an API operation be
>> open/upload/close) -- trying to add transaction properties on a copy
>> when they aren't there in the original seems like it just won't work.
> Just to be clear, osmosis isn't looking for new changesets or
> transactions, it is just looking for entities that have been modified
> within a specific time period. It doesn't know what an API changeset
> or database transaction is. Perhaps it should be looking for
> transactions (although I don't see how that will solve anything yet)
> but that is not currently the case.
Given the use of pgsql transactions, osmosis won't see data from
uncommitted transactions. So I really meant "changes in the database,
subject to the notion that uncommitted transactions won't be visible."
> I'm still against the idea of minute diffs being a "collection of
> changesets". The "collection of uploads" is closer to the mark,
> although uploads are just an API convenience, they have no
> representation in the database and have no meaning to osmosis. minute
> diffs are really a minimal diff to get from one point in time to
I agree. Having looked at the schema, I don't see how osmosis can
extract a diff (meaning some data that can be replayed into a copy of
the database) without support for what is essentially a journal. Just
looking at nodes, the nodes table doesn't seem to quite be a journal for
the current_nodes table.
> To complicate things slightly further, the full history files
> are similar but complete a full delta from one point in time to
> another and may contain several versions of a single entity.
> So perhaps the term "diffs" is the right one for the existing files
> and "deltas" is the right one for full history files.
I would hope that both have the property that if a copy of the DB that
was right at the earlier time, then applying delta or diff to that copy
gets one a copy of the database as of when the osmosis extract
transaction ran. Perhaps then the delta has the intermediate steps and
the diff is permitted to collapse them?
> The reason I've tended to avoid the word "diffs" is because the planet
> directory also contains diffs between planet files. These diffs are
> yet another way of describing changes/differences and are truly a
> difference between two planet files.
As in the output of the diff command on two text files which happen to
contain xml, it sounds like.
> If you're not familiar with it already, please check out the API
> schema. If information isn't stored there, we can't query it. For
> example, there is no concept of an upload in the database, the only
> grouping feature it has is changesets.
Thanks - read and sort of understood - there's a lot in there.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 193 bytes
Desc: not available
More information about the dev