[OSM-dev] Fwd: Streaming Replication

Brett Henderson brett at bretth.com
Sat Oct 13 05:44:44 BST 2012


Oops, I sent my previous email to the wrong list ...

---------- Forwarded message ----------
From: Brett Henderson <brett at bretth.com>
Date: 13 October 2012 15:43
Subject: Streaming Replication
To: osmosis-dev <osmosis-dev at openstreetmap.org>


Hi All,

For those of you who currently use the minute diffs to keep a local
database up to date, you may be interested to know that a new form of
replication has hit the street.

The current replication system is based on a series of static replication
files that are placed on a web server for clients to download as described
here:
http://wiki.openstreetmap.org/wiki/Planet.osm/diffs#Using_the_replication_diffs

It is a very simple mechanism and works well for the existing daily, hourly
and minutely replication feeds.  Unfortunately it doesn't work well for
sub-minute replication because it becomes far too "chatty".  On the server
side, the current feeds are generated from cron which also works well down
to one minute intervals, but the overhead of launching a new process and
connecting to the database for every replication interval also becomes too
inefficient for shorter intervals.

To solve this, a new streaming replication mechanism has been developed.
Under the covers the same database queries are utilised, but the process
performing the queries runs continously and polls the database for changes
at a shorter interval.  It is currently set to poll every 10 seconds, but
it can be reduced further if required.  The network transport is also
continuous and holds a single HTTP connection open for the lifetime of
communication between the server and client.  It is all implemented within
the latest version of Osmosis 0.41.  If you wish to experiment with the
server-side tasks however, several bugs have been fixed in the latest
development version.  Internally it uses the JBoss Netty framework which
means that it's all event-driven (ie. doesn't require a thread per client)
and should theoretically support a large number of concurrent clients.

To quickly see this in action, point your browser at this URL and you
should see new replication "state" data become available approximately
every 10 seconds.
http://planet.openstreetmap.org/replication/streaming/replicationState/current/tail

New Osmosis tasks have been developed to consume this data.  For some basic
instructions to help you get started, refer to this link:
http://wiki.openstreetmap.org/wiki/Osmosis/Replication#Client-side_Streaming

If you don't wish to use Osmosis, some basic documentation on the wire
protocol is available here:
http://wiki.openstreetmap.org/wiki/Osmosis/Replication#Streaming_Replication_Wire_Protocol

This is very much experimental and bugs will undoubtedly be encountered to
please be wary about trusting it to update your database if you've just
spent two weeks importing a planet file.  However, I'd love to see it get
some usage and would welcome any feedback.  This is not intended for use in
updating a local planet file as the existing daily files are better suited
to that.  For databases that can tolerate a minute delay, the existing
mechanism is very simple and has proven to be fairly reliable.  But if you
really need current access to data, and can cope with the additional
complexity, this should be useful.  The current 10 second delay is not a
lower limit, but is a good starting point for now.

Cheers,
Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20121013/53324dd0/attachment.html>


More information about the dev mailing list