[OSM-dev] Changeset And Replication

Tue Oct 21 16:50:19 BST 2008

On Tue, Oct 21, 2008 at 12:39 PM, Brett Henderson <brett at bretth.com> wrote:

> Hi All,
>
> I'm in the process of updating Osmosis to work with API 0.6, or more
> specifically to work with the new MySQL schema.
>
> The biggest change is the introduction of changesets.  I'm interested in
> people's thoughts on how this should be done.
>
> **** Option 1 ****
> My initial plan is not to look at the changeset table at all.  I will
> continue to use the node/way/relation history tables as I do in 0.5 and
> only use the changeset table as a means of joining to the user table.
> When writing updates to a destination MySQL database, I will create a
> changeset per user per replication interval.  In other words if using
> minute changesets, there will be one changeset created per user per
> minute.  Hourly changesets will result in one changeset per user per
> hour.  This should be straightforward to implement.  This will have two
> major limitations:
> 1. Changesets will not align with changesets in the master production
> database.
> 2. The bounding box information on the changesets will all be set to the
> whole planet.  It may be possible to make the bounding boxes accurate
> but it will add a large overhead to processing so I won't provide it in
> the initial release.
>
> **** Option 2 ****
> A possible enhancement is to replicate changesets themselves.  There are
> a number of ways this could be done but the current changeset
> implementation makes all of them difficult in their own way.  I would
> have liked to use changesets themselves as a basis of replication to
> identify what data has been written during a change interval but this is
> not possible because changesets are not guaranteed to be independent
> (ie. non-overlapping) with other changesets, cannot be relied upon to be
> closed in a timely fashion (thus having no further updates), and don't
> have a closing timestamp.  The second method I've been leaning towards
> is to introduce a new changeset element type in the changeset file which
> will include all changesets that have been created (but may not be
> closed yet) in the change interval.  This second method has the issue
> that the bounding box information may not be final because more changes
> may yet be written.
>
> **** Problems with Changeset Replication ****
> In short I don't have a way of creating useful changesets in replicated
> databases.  The first option creates artificial changesets without bbox
> information (although could have bbox information by adding a large
> overhead to initial import), and the second option has problems with
> bbox information due to the bboxes changing after the point of
> replication.  If changesets are not important outside of the main
> database then we can proceed with Option 1.  If replicated changesets
> are considered useful, then I can't see a workable solution for Option 2
> using the current changeset implementation and believe a change in
> design is required.  I'd like to see replicated changesets but the
> usefulness may be outweighed by increased complexity.
>
> **** Possible Fixes ****
> The easiest fix from a replication point of view would be to make
> changesets atomic but this precludes live editors like Potlatch.
> Another option is to introduce a form of locking where records are
> locked until their owning changesets are completed but this adds
> complexity to the current implementation and may block edits if
> changesets are long-lived.
>
> The advantage of either fix is that osmosis knows for sure that a
> changeset is complete and is thus a candidate for replication and that
> the changeset can be applied in isolation to other changesets so long as
> changesets are applied in chronological order.  I'd like to see the
> locking method employed, this would require a daemon to run which limits
> the duration of changesets to sensible values (eg. 5 minutes but
> potentially variable based on changeset activity) and auto-closes
> changesets if timeout expires.  For extra points the API could avoid
> exposing edited data until changesets are closed.
>

Changesets are not atomic transactions, so I don't see any point in trying
to identify and work with closed changesets.  There's no rollback for a
changeset and incomplete changesets don't break anything.

As I understand it the changeset bbox is derived data so I don't see any
need for osmosis to provide it.  The consumer can derive it, the same way as
the main server does, if it needs it.

I'd be quite happy with option 2, but without any bbox info.

If you really want to provide bbox info then you'd probably need to provide
a feed of changeset changes.  Each time the bbox is extended by the main
server you'd need to supply a changeset update with the new bbox values.
This is doable, but seems a bit pointless really.

80n

>
>
>
> Hopefully the above makes sense.  Any thoughts and feedback appreciated.
>
> Brett
>
>
> _______________________________________________
> dev mailing list
> dev at openstreetmap.org
> http://lists.openstreetmap.org/listinfo/dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20081021/c3df2e6d/attachment.html>