[OSM-dev] Minute Diffs Broken

Greg Troxel gdt at ir.bbn.com
Tue May 5 13:45:35 BST 2009

  My aim all along has been to provide people with up to date data.  The 
  nice thing about the minute changesets is that they let you have an 
  offline database that exactly matches the API as of 6 minutes ago.  I'd 
  completely agree with you if the API only released data once the 
  changeset was closed but that's not the case.

I am a bit confused by some of the terms being used here.  The basic
issue for me is that we have API operations, which correspond to
database transacations.  Ignoring SERIALIZABLE vs READ COMMITTED, these
operations are quite safe.  These operations are not changesets.

Here are logs from squid, showing me querying data and upoading some changes.

1241488036.615   5688 TCP_MISS/200 38363 GET http://www.openstreetmap.org/api/0.6/map? - DIRECT/ text/xml
1241488089.540    276 TCP_MISS/200 661 GET http://www.openstreetmap.org/api/capabilities - DIRECT/ text/xml
1241488099.552    312 TCP_MISS/200 373 PUT http://www.openstreetmap.org/api/0.6/changeset/create - DIRECT/ text/plain
1241488102.378   2817 TCP_MISS/200 656 POST http://www.openstreetmap.org/api/0.6/changeset/1081606/upload - DIRECT/ text/xml
1241488102.570    163 TCP_MISS/200 366 PUT http://www.openstreetmap.org/api/0.6/changeset/1081606/close - DIRECT/ text/html

So there are one read and then three write database transactions, and
one changeset.  The read is not hard and the writes all happens close in
time (JOSM), but it won't necessarily be so.

Given the way the world is, it seems like the minute diffs really should
be looking for new transactions, not new changesets.  I can see
Frederik's point of only exporting closed changesets, but for that to
really make sense I think the main database has to isolate changesets
From each other until they are fully committed (meaning either
long-running transactions or an API change to have an API operation be
open/upload/close) -- trying to add transaction properties on a copy
when they aren't there in the original seems like it just won't work.

This is also confusing in wording because in svn changeset is a
transaction, and it's not just SERIALIZABLE but actually SERIALIZED, so
the word changeset can have a wrong connotation.

I think we have

  uploads == db transactions (perhaps "microchangesets" of "changeset fragments"??)

  changesets == (some group of uploads, with a common id and comment)

  minute diffs == (some collection of uploads)

or maybe we will have

  minute diffs == (some collection of changesets)

but in that case the db created by the minute diff may refer to objects
which are not present, breaking the integrity guarantees that 0.6 got

I don't have a clue about how the uploads are numbered and how easy it
is to extract all of them, but given that the main DB can have committed
transactions with uploads that are not part of a closed changeset, I
think the minute diff and replicated dbs should have that too.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 193 bytes
Desc: not available
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20090505/6eb0862e/attachment.pgp>

More information about the dev mailing list