[OSM-dev] Osmosis Command Line Improvements

Mon Nov 26 22:12:35 GMT 2007

Frederik Ramm wrote:
> These make life much easier, thank you! We can soon start deleting all
> sorts of perl scripts from the repository as they've been obsoleted by
> osmosis. It seems that polygon extraction outperforms my perl
> implementation by a factor of 5 to 10 which means that either yours is
> excellent or mine was crap.
>   
Cool.  After thinking about writing my own polygon algorithm and being 
reminded that people have already solved this problem many times before 
I used the inbuilt java libraries.  Apparently those guys knew what they 
were doing :-)  The java.awt.geom.Area class in JDK1.6 is fantastic 
(hence my reluctance to support JDK1.5).  You can add/subtract areas 
to/from other areas and using differing precision (either floats or 
doubles) with very minimal code.  My polygon file loader just reads the 
file from start to finish adding and subtracting polygons as it 
encounters them.  The library just works.

I considered wrapping the polygon check inside a bounding box check for 
efficiency but I suspect it does something like that internally anyway.

The initial polygon load can be slow.  Loading an Australia polygon 
takes about 10 minutes but that is a huge 2.32 MB polygon which is 
hopefully a worst case scenario.  From memory the Germany polygon was 
far quicker.
> I'd like to ask for more default arguments. The "file" option in the
> --bounding-polygon task and the "interval" option in --log-progress
> seem obvious candidates to me.
>   
No problem.  I've added them to the future_features.txt list so that I 
don't forget.  These are simple changes now that I have the framework in 
place and won't take too long.  However, I probably won't get them done 
by this Friday and I'm away for two weeks after that.

I've added a fairly minimal set of default arguments for now because 
it's easier to add these things than to remove them.  Happy to consider 
adding new ones.

My next step is to shorten option names but it's a bigger change so may 
not happen for a little while yet.
> Now the only remaining thing about osmosis that makes it somewhat
> clumsy is that it talks too much. If something goes wrong, you get 20+
> lines of error instead of one nice and concise message about what's
> wrong. And even if nothing goes wrong, you need at least one -q if you
> don't want to know what all those tasks are doing. But then again if
> you deliberately want to see progress info and include -lp, this gets
> eaten by -q. Sigh. Can't have it all I guess ;-)
>   
Yeah agree, osmosis logging is poor as you've discovered :-)  Much of 
the problem is caused by using the jdk logging framework which is great 
for writing re-usable libraries, but not so great for writing command 
line applications.  The logging framework is infinitely configurable but 
exposing the configuration in a simple way on the command line is the 
challenge.

I've added this to future_features.txt as well, not sure how I'm going 
to solve it yet.
> Am I right in assuming that whenever you use output compression on a
> multi-CPU machine, --buffer is a probably good thing to do?
>   
The simple answer is that in most cases adding a --buffer before the 
final compressed output should improve performance.

It depends on how heavily the previous thread in the pipeline is loaded 
(ie. whether previous tasks on same thread consume much cpu).  I should 
add info to the wiki indicating which tasks are active (run in their own 
thread) and those which are passive (just respond to input data).  There 
is a tradeoff between adding more threads to utilise multi cpus and the 
overhead of synchronising access to data between threads.

Cheers,
Brett