[OSM-dev] TRAPI status

Brett Henderson brett at bretth.com
Fri Mar 12 11:20:56 GMT 2010


On Thu, Mar 11, 2010 at 10:10 PM, Lars Francke <lars.francke at gmail.com>wrote:

> >> I started working on a streaming XML output plugin for Osmosis. I was
> >> intending to take advantage of PuSH/PubSubHub messaging and maybe even
> XMPP
> >> (so that you get a 1-min delayed IM when someone changes something in
> your
> >> bbox).
> >> Anyway, TRAPI could use this same plugin to apply updates to their
> >> database.
> >
> > I've also spent a fair bit of time thinking about this type of thing.
> When
> > I first started work on the replication diffs I had in mind a server-side
> > daemon (using Osmosis internally) that would push changes to all
> connected
> > clients.  It would allow a client to connect, specify which replication
> > number it was up to, receive all updates in a single stream, then
> continue
> > to receive live changes as they occurred.
>
> These are things we talked about at the recent FOSSGIS conference. I
> planned to do a more detailed write-up but I don't know when I'll get
> to that so this is not finished. We too agreed that some kind of
> streaming/subscription to changesets would be a good idea. Our focus
> though was on the german dev servers so that not everyone would need
> to write an .osc parser and download the changeset files etc.
>

Cool, sounds interesting.  I'll take a look if you get something written
up.

>
> > But I would really like to see it happen :-)
>
> I currently use AMQP (RabbitMQ) for message processing and it works
> very well. It is very flexible and it'd be easy to extend it with a
> PubSubHubBub or XMPP output.
>
> Mitja (of OpenStreetBugs) proposed just yesterday a filter that
> filters changes by the tags/changes involved so it would be very easy
> to subscribe to only the events you are interested in. This can be
> implemented in just a few lines of code.
>
> While we only thought about doing this on the german dev servers I've
> since gotten multiple requests/questions and suggestions that this
> should be integrated into the main OSM site. All that'd be needed
> would be a call in the Ruby API that sends a message (asynchronously,
> very fast) once a change has been made. This would make the generation
> of the diff files a lot easier and everything more flexible. I haven't
> yet asked anyone if this would be a possible. I know that this isn't
> the right topic (although the TRAPI could also use this system) but I
> wanted to take the opportunity to inform about our ideas.
>

Thanks for the info.  I'll add a few comments (because I can't help myself
;-).

Most OSM systems tend to have a large number of disorganised and
uncontrolled clients.  Does this work well with the AMQP paradigm?  In other
words, does it take administrative overhead to register new subscriptions to
a queue?  What happens if large numbers of subscriptions are created then
the clients disappear?  Is AMQP targetted at a world where the clients are
relatively controlled and small in number?  It's important to minimise
administration overhead where possible.

Clients will experience outages whether that be due to a network problem,
server reboot, or just not running a system 24x7.  Presumably they need a
way to catch up on missed events.  There are a few options: 1. The server
holds events for clients until they become available again, 2. The client
catches up using an out of band mechanism (eg. downloads diffs directly), or
3. The client can request that the server begin sending data from a specific
point.  I think that only options 1 and 2 are possible using AMQP.  1 is not
scalable, and 2 adds additional client complexity.  3 is what I'd like to
see, but I don't think it can be done using a typical reliable messaging
system such as AMQP.  I hope I'm wrong though.

Something to note about the current replication mechanism is that it doesn't
use any transactional capabilities other than creating files in a particular
order.  All replication state tracking is client side where transactions are
actually occurring (eg. writing to a database, updating a planet file, etc)
which keeps the server highly scalable and agnostic of client reliability.

I don't know how you'd hook into the Ruby API effectively and reliably.  You
can't just wait for changeset closure events because changesets can remain
open for large periods of time.  You really want to be replicating data as
soon as possible after it becomes available to API queries.  This may mean
receiving notification about every single entity as it is created, modified
or deleted from the db, but this will result in huge numbers of events which
will be difficult to process in an efficient manner.  The current
replication solves all of this nicely by querying for data as it is
committed to the database.  After data is committed by the API, it becomes
available to both the API and replication at the same time.  The downside
with the current mechanism is that it has to poll the db for changes,
however the transaction check query is exceptionally fast which makes the
polling overhead quite low.  I also think you'll run into a fair bit of
resistance trying to incorporate changes into the Ruby API, it's simpler at
least to remain independent where possible.  Unless you want to achieve
sub-second replication, the current approach could be run with a very short
replication interval.  The main restriction on replication interval now is
downloading large numbers of files from the planet server, not the
extraction of data from the database.

I guess something to consider is who are the clients of the mechanism.
Somebody wanting to see activity in a geographical area may not care about
reliability and perhaps something like XMPP is appropriate here.  But
anybody wanting reliable replication (ie. TRAPI) will need something robust
that guarantees delivery and data ordering.

Anyway, it's good to hear that some fresh minds are interested in the
problem of changeset distribution.  I'm very interested to hear what comes
out of it.

Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20100312/dbd9d62a/attachment.html>


More information about the dev mailing list