[osmosis-dev] Pipeline Initialisation

Thu Dec 22 10:25:04 GMT 2011

On 10 December 2011 00:21, Brett Henderson <brett at bretth.com> wrote:

> I've been experimenting with some changes to the way the Osmosis pipeline
> executes.
>
> *Existing Operation*
>
> Currently, the typical interaction between a source task and its sink is
> as follows:
>
>    -
>    - Zero or more calls to process(xxxx).
>    - One call to complete() if processing is successful.
>    - One call to release() regardless of success or failure.
>
> This works well enough in most cases.  The main disadvantage for current
> functionality is that *many* classes have to implement lazy initialisation
> and initialise on the first call to process.
>
> *New Operation*
>
> However there's a new feature I'd like to introduce.  I'd like "header"
> information to be able to be passed through the pipeline.  This will take
> the form of a Map<String, Object> and provide a generic way to pass
> additional meta data through the pipeline.  The task interaction would now
> look like:
>
>    - *One call to initialize(Map<String, Object>) at the start of
>    processing.  If startup fails it doesn't have to be called.*
>    - Zero or more calls to process(xxxx).
>    - One call to complete() if processing is successful.
>    - One call to release() regardless of success or failure.
>
> *Reasons*
>
> This may be used for something as simple as passing additional information
> such as replication timestamps, but may also be used by closely related
> tasks to exchange more complex objects.  My main driver for doing this
> right now is to allow me to decompose the current monolithic tasks used for
> replication into smaller tasks.  For example, I can separate the apidb
> schema specific code which extracts data by tracking PostgreSQL specific
> transaction ids from the code that writes the data into change and state
> files.  This allows the apidb code to then feed changes into other tasks
> (eg. constant updates streaming over HTTP).  Along with the metatags now
> able to be attached to all entities, it is now possible to pass all kinds
> of additional data through the pipeline without extending the core.
>
> The XML tasks already support writing the recently added entity metatags
> as additional entity attributes and I'd like them to support this new
> global metadata as well by adding new XML attributes to the main <osm> or
> <osmChange> elements.
>
> Longer term I'd like to replace the existing Bound class with something
> like.  Bound is currently being treated as a normal Entity like nodes, ways
> and relations but it is awkward and involves a number of hacks.  Passing
> all bound information during pipeline startup would be much cleaner (I
> think).  But that isn't a trivial task so will have to wait for another day.
>
> *Code Changes*
>
> The pipeline design hasn't changed much since it was introduced so this is
> a fairly significant change.  The code is already implemented and at least
> compiles and passes unit tests.
>
> https://github.com/brettch/osmosis/tree/init
>
> All tasks have been updated where necessary to support the new initialize
> method, but I haven't updated tasks to take full advantage of it (eg.
> eliminate lazy initialization logic).  Unless I hear any major objections
> I'll merge it into the master branch at least on my repository and probably
> the main openstreetmap/osmosis repository within the next few days.
>
This change is now merged.  It's a fairly invasive change so let me know if
you notice any issues.

Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/osmosis-dev/attachments/20111222/8fff07b2/attachment-0001.html>