On Tue, Oct 21, 2008 at 12:39 PM, Brett Henderson <span dir="ltr"><<a href="mailto:brett@bretth.com">brett@bretth.com</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Hi All,<br>
<br>
I'm in the process of updating Osmosis to work with API 0.6, or more<br>
specifically to work with the new MySQL schema.<br>
<br>
The biggest change is the introduction of changesets. I'm interested in<br>
people's thoughts on how this should be done.<br>
<br>
**** Option 1 ****<br>
My initial plan is not to look at the changeset table at all. I will<br>
continue to use the node/way/relation history tables as I do in 0.5 and<br>
only use the changeset table as a means of joining to the user table.<br>
When writing updates to a destination MySQL database, I will create a<br>
changeset per user per replication interval. In other words if using<br>
minute changesets, there will be one changeset created per user per<br>
minute. Hourly changesets will result in one changeset per user per<br>
hour. This should be straightforward to implement. This will have two<br>
major limitations:<br>
1. Changesets will not align with changesets in the master production<br>
database.<br>
2. The bounding box information on the changesets will all be set to the<br>
whole planet. It may be possible to make the bounding boxes accurate<br>
but it will add a large overhead to processing so I won't provide it in<br>
the initial release.<br>
<br>
**** Option 2 ****<br>
A possible enhancement is to replicate changesets themselves. There are<br>
a number of ways this could be done but the current changeset<br>
implementation makes all of them difficult in their own way. I would<br>
have liked to use changesets themselves as a basis of replication to<br>
identify what data has been written during a change interval but this is<br>
not possible because changesets are not guaranteed to be independent<br>
(ie. non-overlapping) with other changesets, cannot be relied upon to be<br>
closed in a timely fashion (thus having no further updates), and don't<br>
have a closing timestamp. The second method I've been leaning towards<br>
is to introduce a new changeset element type in the changeset file which<br>
will include all changesets that have been created (but may not be<br>
closed yet) in the change interval. This second method has the issue<br>
that the bounding box information may not be final because more changes<br>
may yet be written.<br>
<br>
**** Problems with Changeset Replication ****<br>
In short I don't have a way of creating useful changesets in replicated<br>
databases. The first option creates artificial changesets without bbox<br>
information (although could have bbox information by adding a large<br>
overhead to initial import), and the second option has problems with<br>
bbox information due to the bboxes changing after the point of<br>
replication. If changesets are not important outside of the main<br>
database then we can proceed with Option 1. If replicated changesets<br>
are considered useful, then I can't see a workable solution for Option 2<br>
using the current changeset implementation and believe a change in<br>
design is required. I'd like to see replicated changesets but the<br>
usefulness may be outweighed by increased complexity.<br>
<br>
**** Possible Fixes ****<br>
The easiest fix from a replication point of view would be to make<br>
changesets atomic but this precludes live editors like Potlatch.<br>
Another option is to introduce a form of locking where records are<br>
locked until their owning changesets are completed but this adds<br>
complexity to the current implementation and may block edits if<br>
changesets are long-lived.<br>
<br>
The advantage of either fix is that osmosis knows for sure that a<br>
changeset is complete and is thus a candidate for replication and that<br>
the changeset can be applied in isolation to other changesets so long as<br>
changesets are applied in chronological order. I'd like to see the<br>
locking method employed, this would require a daemon to run which limits<br>
the duration of changesets to sensible values (eg. 5 minutes but<br>
potentially variable based on changeset activity) and auto-closes<br>
changesets if timeout expires. For extra points the API could avoid<br>
exposing edited data until changesets are closed.<br>
</blockquote><div><br><br>Changesets are not atomic transactions, so I don't see any point in trying to identify and work with closed changesets. There's no rollback for a changeset and incomplete changesets don't break anything.<br>
<br>As I understand it the changeset bbox is derived data so I don't see any need for osmosis to provide it. The consumer can derive it, the same way as the main server does, if it needs it.<br><br>I'd be quite happy with option 2, but without any bbox info.<br>
<br>If you really want to provide bbox info then you'd probably need to provide a feed of changeset changes. Each time the bbox is extended by the main server you'd need to supply a changeset update with the new bbox values. This is doable, but seems a bit pointless really.<br>
<br>80n<br><br> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br>
<br>
<br>
Hopefully the above makes sense. Any thoughts and feedback appreciated.<br>
<br>
Brett<br>
<br>
<br>
_______________________________________________<br>
dev mailing list<br>
<a href="mailto:dev@openstreetmap.org">dev@openstreetmap.org</a><br>
<a href="http://lists.openstreetmap.org/listinfo/dev" target="_blank">http://lists.openstreetmap.org/listinfo/dev</a><br>
</blockquote></div><br>