On Tue, Oct 21, 2008 at 12:39 PM, Brett Henderson <span dir="ltr"><<a href="mailto:brett@bretth.com">brett@bretth.com</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

Hi All,<br>

<br>

I'm in the process of updating Osmosis to work with API 0.6, or more<br>

specifically to work with the new MySQL schema.<br>

<br>

The biggest change is the introduction of changesets.  I'm interested in<br>

people's thoughts on how this should be done.<br>

<br>

**** Option 1 ****<br>

My initial plan is not to look at the changeset table at all.  I will<br>

continue to use the node/way/relation history tables as I do in 0.5 and<br>

only use the changeset table as a means of joining to the user table.<br>

When writing updates to a destination MySQL database, I will create a<br>

changeset per user per replication interval.  In other words if using<br>

minute changesets, there will be one changeset created per user per<br>

minute.  Hourly changesets will result in one changeset per user per<br>

hour.  This should be straightforward to implement.  This will have two<br>

major limitations:<br>

1. Changesets will not align with changesets in the master production<br>

database.<br>

2. The bounding box information on the changesets will all be set to the<br>

whole planet.  It may be possible to make the bounding boxes accurate<br>

but it will add a large overhead to processing so I won't provide it in<br>

the initial release.<br>

<br>

**** Option 2 ****<br>

A possible enhancement is to replicate changesets themselves.  There are<br>

a number of ways this could be done but the current changeset<br>

implementation makes all of them difficult in their own way.  I would<br>

have liked to use changesets themselves as a basis of replication to<br>

identify what data has been written during a change interval but this is<br>

not possible because changesets are not guaranteed to be independent<br>

(ie. non-overlapping) with other changesets, cannot be relied upon to be<br>

closed in a timely fashion (thus having no further updates), and don't<br>

have a closing timestamp.  The second method I've been leaning towards<br>

is to introduce a new changeset element type in the changeset file which<br>

will include all changesets that have been created (but may not be<br>

closed yet) in the change interval.  This second method has the issue<br>

that the bounding box information may not be final because more changes<br>

may yet be written.<br>

<br>

**** Problems with Changeset Replication ****<br>

In short I don't have a way of creating useful changesets in replicated<br>

databases.  The first option creates artificial changesets without bbox<br>

information (although could have bbox information by adding a large<br>

overhead to initial import), and the second option has problems with<br>

bbox information due to the bboxes changing after the point of<br>

replication.  If changesets are not important outside of the main<br>

database then we can proceed with Option 1.  If replicated changesets<br>

are considered useful, then I can't see a workable solution for Option 2<br>

using the current changeset implementation and believe a change in<br>

design is required.  I'd like to see replicated changesets but the<br>

usefulness may be outweighed by increased complexity.<br>

<br>

**** Possible Fixes ****<br>

The easiest fix from a replication point of view would be to make<br>

changesets atomic but this precludes live editors like Potlatch.<br>

Another option is to introduce a form of locking where records are<br>

locked until their owning changesets are completed but this adds<br>

complexity to the current implementation and may block edits if<br>

changesets are long-lived.<br>

<br>

The advantage of either fix is that osmosis knows for sure that a<br>

changeset is complete and is thus a candidate for replication and that<br>

the changeset can be applied in isolation to other changesets so long as<br>

changesets are applied in chronological order.  I'd like to see the<br>

locking method employed, this would require a daemon to run which limits<br>

the duration of changesets to sensible values (eg. 5 minutes but<br>

potentially variable based on changeset activity) and auto-closes<br>

changesets if timeout expires.  For extra points the API could avoid<br>

exposing edited data until changesets are closed.<br>

</blockquote><div><br><br>Changesets are not atomic transactions, so I don't see any point in trying to identify and work with closed changesets.  There's no rollback for a changeset and incomplete changesets don't break anything.<br>

<br>As I understand it the changeset bbox is derived data so I don't see any need for osmosis to provide it.  The consumer can derive it, the same way as the main server does, if it needs it.<br><br>I'd be quite happy with option 2, but without any bbox info.<br>

<br>If you really want to provide bbox info then you'd probably need to provide a feed of changeset changes.  Each time the bbox is extended by the main server you'd need to supply a changeset update with the new bbox values.  This is doable, but seems a bit pointless really.<br>

<br>80n<br><br> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br>

<br>

<br>

Hopefully the above makes sense.  Any thoughts and feedback appreciated.<br>

<br>

Brett<br>

<br>

<br>

_______________________________________________<br>

dev mailing list<br>

<a href="mailto:dev@openstreetmap.org">dev@openstreetmap.org</a><br>

<a href="http://lists.openstreetmap.org/listinfo/dev" target="_blank">http://lists.openstreetmap.org/listinfo/dev</a><br>

</blockquote></div><br>