[osmosis-dev] Pipeline Initialisation

Brett Henderson brett at bretth.com
Fri Dec 9 13:21:34 GMT 2011


I've been experimenting with some changes to the way the Osmosis pipeline
executes.

*Existing Operation*

Currently, the typical interaction between a source task and its sink is as
follows:

   -
   - Zero or more calls to process(xxxx).
   - One call to complete() if processing is successful.
   - One call to release() regardless of success or failure.

This works well enough in most cases.  The main disadvantage for current
functionality is that *many* classes have to implement lazy initialisation
and initialise on the first call to process.

*New Operation*

However there's a new feature I'd like to introduce.  I'd like "header"
information to be able to be passed through the pipeline.  This will take
the form of a Map<String, Object> and provide a generic way to pass
additional meta data through the pipeline.  The task interaction would now
look like:

   - *One call to initialize(Map<String, Object>) at the start of
   processing.  If startup fails it doesn't have to be called.*
   - Zero or more calls to process(xxxx).
   - One call to complete() if processing is successful.
   - One call to release() regardless of success or failure.

*Reasons*

This may be used for something as simple as passing additional information
such as replication timestamps, but may also be used by closely related
tasks to exchange more complex objects.  My main driver for doing this
right now is to allow me to decompose the current monolithic tasks used for
replication into smaller tasks.  For example, I can separate the apidb
schema specific code which extracts data by tracking PostgreSQL specific
transaction ids from the code that writes the data into change and state
files.  This allows the apidb code to then feed changes into other tasks
(eg. constant updates streaming over HTTP).  Along with the metatags now
able to be attached to all entities, it is now possible to pass all kinds
of additional data through the pipeline without extending the core.

The XML tasks already support writing the recently added entity metatags as
additional entity attributes and I'd like them to support this new global
metadata as well by adding new XML attributes to the main <osm> or
<osmChange> elements.

Longer term I'd like to replace the existing Bound class with something
like.  Bound is currently being treated as a normal Entity like nodes, ways
and relations but it is awkward and involves a number of hacks.  Passing
all bound information during pipeline startup would be much cleaner (I
think).  But that isn't a trivial task so will have to wait for another day.

*Code Changes*

The pipeline design hasn't changed much since it was introduced so this is
a fairly significant change.  The code is already implemented and at least
compiles and passes unit tests.

https://github.com/brettch/osmosis/tree/init

All tasks have been updated where necessary to support the new initialize
method, but I haven't updated tasks to take full advantage of it (eg.
eliminate lazy initialization logic).  Unless I hear any major objections
I'll merge it into the master branch at least on my repository and probably
the main openstreetmap/osmosis repository within the next few days.


Let me know if you have any thoughts or suggestions.

Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/osmosis-dev/attachments/20111210/a32d93e3/attachment.html>


More information about the osmosis-dev mailing list