[OSM-dev] API 0.6: Changeset Access
Brett Henderson
brett at bretth.com
Wed Jan 28 20:48:18 GMT 2009
80n wrote:
> On Wed, Jan 28, 2009 at 1:06 PM, Brett Henderson <brett at bretth.com
> <mailto:brett at bretth.com>> wrote:
>
> 80n wrote:
>
>
> Replication of changesets is on my undocumented long term TODO
> list for
> osmosis (I should add it to trac) but I don't know when I'll be
> able to
> do it. I had some discussions with Shaun and Matt a while
> back on how
> this might be done efficiently. Identifying changesets for
> replication
> is a bit tricky and would probably involve two passes,
> first pass
> would
> identify all changesets created in a time interval, and the
> second
> pass
> would identify all changesets modified (ie. have entities
> referring to
> them) in a time interval. Once identified they could be
> read and
> included in a changeset file just like any other entity.
>
>
> Perhaps you only need to worry about exporting the changeset
> headers. The members of a changeset can be derived from the
> elements themselves.
>
> Yes, that is fine, in fact I think that's what we were planning to
> do ... my memory is hazy :-) The header info is the difficult bit
> because it changes over time. From memory (I don't have the
> schema handy) the main issue is the bounding box fields. These
> can be updated over time as new entity updates are added to the
> changeset. I agree that only the header info needs to be
> extracted, but it needs to be included in every changeset that has
> an entity referring to it in case the bounding box was updated
> since the last replication.
>
> Is the bounding box info updated within the same transaction as
> entity updates or can it be updated asynchronously with a separate
> daemon? I remember discussions about bbox updates only occurring
> occasionally (ie. the bbox is made slightly larger than necessary
> to avoid large numbers of writes) but I'm fairly sure they're
> updated synchronous at the same time as the entity causing the
> update, at least I hope so.
>
>
> The bbox can also be derived from the changeset members so that's not
> really essential either. The rest of the header probably won't change
> significantly, if at all, during the life of a changeset.
It's not essential if writing to a database, but it's expensive for the
client to re-calculate. I was hoping that could be avoided because it
drastically increases the time taken to apply changesets. You don't
need to include many of these expensive updates before the time taken to
apply a changeset is greater than the interval it represents (ie. a
minute changeset takes two minutes to apply). So far all updates are
performed by queries operating at the individual entity level and don't
require complex queries across many rows.
>
> I think you could just query the api directly, the changeset ids will
> be allocated sequentially. As changesets are automatically closed
> after 24 hours, a second query process running 24 hours behind can
> re-query each changeset and be sure that it will no longer change.
This might work. Remember that osmosis changesets are designed to be
able to be extracted at any time and for any time period in the past.
So when you tell it to get all changes between 1 and 2 am on the 2nd of
last month, that's what you expect to see in the changeset. It can't
rely on information gained from previous runs. It would be relatively
simple to include all changesets that were opened during that interval,
and within the same interval 24 hours earlier which might solve the
issue of changed changesets. Would that be acceptable? It's a kludge
but might be "good enough".
As for sequential ids, I can't see how that helps. When you get down to
minute changesets there will be a lot of overlapping changeset ids where
changesets appear then disappear for a few minutes then reappear again.
Each changeset is extracted in isolation to other changesets so can't
rely on information gained from previous invocations. Relying on
sequential extractions would greatly limit flexibility. Including all
active changesets might be acceptable for hourly and daily changesets,
but again is a bad idea for minute changesets where most active
changesets will not have updates in that period.
Brett
More information about the dev
mailing list