<div class="gmail_quote">On Thu, Mar 11, 2010 at 10:10 PM, Lars Francke <span dir="ltr"><<a href="mailto:lars.francke@gmail.com">lars.francke@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<div class="im">>> I started working on a streaming XML output plugin for Osmosis. I was<br>
>> intending to take advantage of PuSH/PubSubHub messaging and maybe even XMPP<br>
>> (so that you get a 1-min delayed IM when someone changes something in your<br>
>> bbox).<br>
>> Anyway, TRAPI could use this same plugin to apply updates to their<br>
>> database.<br>
><br>
> I've also spent a fair bit of time thinking about this type of thing. When<br>
> I first started work on the replication diffs I had in mind a server-side<br>
> daemon (using Osmosis internally) that would push changes to all connected<br>
> clients. It would allow a client to connect, specify which replication<br>
> number it was up to, receive all updates in a single stream, then continue<br>
> to receive live changes as they occurred.<br>
<br>
</div>These are things we talked about at the recent FOSSGIS conference. I<br>
planned to do a more detailed write-up but I don't know when I'll get<br>
to that so this is not finished. We too agreed that some kind of<br>
streaming/subscription to changesets would be a good idea. Our focus<br>
though was on the german dev servers so that not everyone would need<br>
to write an .osc parser and download the changeset files etc.<br></blockquote><div><br>Cool, sounds interesting. I'll take a look if you get something written up. <br></div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<div class="im"><br>
> But I would really like to see it happen :-)<br>
<br>
</div>I currently use AMQP (RabbitMQ) for message processing and it works<br>
very well. It is very flexible and it'd be easy to extend it with a<br>
PubSubHubBub or XMPP output.<br>
<br>
Mitja (of OpenStreetBugs) proposed just yesterday a filter that<br>
filters changes by the tags/changes involved so it would be very easy<br>
to subscribe to only the events you are interested in. This can be<br>
implemented in just a few lines of code.<br>
<br>
While we only thought about doing this on the german dev servers I've<br>
since gotten multiple requests/questions and suggestions that this<br>
should be integrated into the main OSM site. All that'd be needed<br>
would be a call in the Ruby API that sends a message (asynchronously,<br>
very fast) once a change has been made. This would make the generation<br>
of the diff files a lot easier and everything more flexible. I haven't<br>
yet asked anyone if this would be a possible. I know that this isn't<br>
the right topic (although the TRAPI could also use this system) but I<br>
wanted to take the opportunity to inform about our ideas.<br></blockquote></div><br>Thanks for the info. I'll add a few comments (because I can't help myself ;-).<br><br>Most OSM systems tend to have a large number of disorganised and uncontrolled clients. Does this work well with the AMQP paradigm? In other words, does it take administrative overhead to register new subscriptions to a queue? What happens if large numbers of subscriptions are created then the clients disappear? Is AMQP targetted at a world where the clients are relatively controlled and small in number? It's important to minimise administration overhead where possible.<br>
<br>Clients will experience outages whether that be due to a network problem, server reboot, or just not running a system 24x7. Presumably they need a way to catch up on missed events. There are a few options: 1. The server holds events for clients until they become available again, 2. The client catches up using an out of band mechanism (eg. downloads diffs directly), or 3. The client can request that the server begin sending data from a specific point. I think that only options 1 and 2 are possible using AMQP. 1 is not scalable, and 2 adds additional client complexity. 3 is what I'd like to see, but I don't think it can be done using a typical reliable messaging system such as AMQP. I hope I'm wrong though.<br>
<br>Something to note about the current replication mechanism is that it
doesn't use any transactional capabilities other than creating files in a
particular order. All replication state tracking is client side where
transactions are actually occurring (eg. writing to a database, updating
a planet file, etc) which keeps the server highly scalable and agnostic of client reliability.<br><br>I don't know how you'd hook into the Ruby API effectively and reliably. You can't just wait for changeset closure events because changesets can remain open for large periods of time. You really want to be replicating data as soon as possible after it becomes available to API queries. This may mean receiving notification about every single entity as it is created, modified or deleted from the db, but this will result in huge numbers of events which will be difficult to process in an efficient manner. The current replication solves all of this nicely by querying for data as it is committed to the database. After data is committed by the API, it becomes available to both the API and replication at the same time. The downside with the current mechanism is that it has to poll the db for changes, however the transaction check query is exceptionally fast which makes the polling overhead quite low. I also think you'll run into a fair bit of resistance trying to incorporate changes into the Ruby API, it's simpler at least to remain independent where possible. Unless you want to achieve sub-second replication, the current approach could be run with a very short replication interval. The main restriction on replication interval now is downloading large numbers of files from the planet server, not the extraction of data from the database.<br>
<br>I guess something to consider is who are the clients of the mechanism. Somebody wanting to see activity in a geographical area may not care about reliability and perhaps something like XMPP is appropriate here. But anybody wanting reliable replication (ie. TRAPI) will need something robust that guarantees delivery and data ordering.<br>
<br>Anyway, it's good to hear that some fresh minds are interested in the problem of changeset distribution. I'm very interested to hear what comes out of it.<br><br>Brett<br><br>