[osmosis-dev] --used-node performance and a possible way to improve it

Wed Jun 8 10:03:30 BST 2011

On Fri, Jun 03, 2011 at 10:04:40PM +0200, Igor Podolskiy wrote:
> [lots of good thinking]

I have thoght about these things quite a lot, too. I agree with Igor that we
are bumping into the limits of what the current Osmosis pipelining model can
do. And it gets even more complicated: For lots of the things we want to do the
best way of doing it depends on the kind of data we are working on: For working
on a small excerpt a totally different approach might be needed than when
working on the whole planet. For the excerpt we might be able to work
completely in memory, for the planet we might need temporary files or multiple
passes on the input.

The idea of having some "control plane" that magically does all the right
things is very tempting, but I think it is premature to try to define it. We
don't know enough about the different things people are doing and want to do
with OSM data yet. If we design such a control plane, it would probably end
up a huge monster that can do everything, but nobody can actually work with
it, because of its complexity.

I have been trying a different approach with the Osmium framework that I have
been working on for the last year or so: Don't try to solve every problem in
one gigantic flexible program. Instead have lots of building blocks that a
programmer can use to fit together a program that does exactly what he needs
the way he needs it. There is no "control plane", but a programmer who decides
to use this class and that driver and maybe a class he writes himself deriving
it from some other class from the toolkit.

What we loose is the ability for a "normal user" to build this by using a few
command line options (or a GUI), what we gain is a lot of flexibility.

And then, later, when we have a better idea of the typical uses, we can build
this "control plane" on top of the library. Actually we will probably not just
build one control plane but several. And together they will probably solve
95% of all use cases and for the rest you still go back to the source code.

I started my own project, Osmium, to go that route, but I don't see why the
same thing couldn't be done with Osmosis: Split Osmosis into two parts, one
is the low-level code for OSM objects, readers and writers etc. And one is
the "control plane" code that stitches pipelines together etc. Leave the
Osmosis application as it is, it solves many many problems in a very nice
and easy way. But also use the low-level blocks and stick them together in
new ways in new applications that don't use the current pipelining model but
other combinations.

(I also encourage you to take a look at Osmium (https://github.com/joto/osmium
either to use it or to steal ideas from. It can read and write XML and PBF,
assemble multipolygons, do several passes over the input, filter data, create
shapefiles, and many other things. :-)

Jochen
-- 
Jochen Topf  jochen at remote.org  http://www.remote.org/jochen/  +49-721-388298