[OSM-dev] Osmosis Command Line Improvements
Brett Henderson
brett at bretth.com
Mon Nov 26 22:12:35 GMT 2007
Frederik Ramm wrote:
> These make life much easier, thank you! We can soon start deleting all
> sorts of perl scripts from the repository as they've been obsoleted by
> osmosis. It seems that polygon extraction outperforms my perl
> implementation by a factor of 5 to 10 which means that either yours is
> excellent or mine was crap.
>
Cool. After thinking about writing my own polygon algorithm and being
reminded that people have already solved this problem many times before
I used the inbuilt java libraries. Apparently those guys knew what they
were doing :-) The java.awt.geom.Area class in JDK1.6 is fantastic
(hence my reluctance to support JDK1.5). You can add/subtract areas
to/from other areas and using differing precision (either floats or
doubles) with very minimal code. My polygon file loader just reads the
file from start to finish adding and subtracting polygons as it
encounters them. The library just works.
I considered wrapping the polygon check inside a bounding box check for
efficiency but I suspect it does something like that internally anyway.
The initial polygon load can be slow. Loading an Australia polygon
takes about 10 minutes but that is a huge 2.32 MB polygon which is
hopefully a worst case scenario. From memory the Germany polygon was
far quicker.
> I'd like to ask for more default arguments. The "file" option in the
> --bounding-polygon task and the "interval" option in --log-progress
> seem obvious candidates to me.
>
No problem. I've added them to the future_features.txt list so that I
don't forget. These are simple changes now that I have the framework in
place and won't take too long. However, I probably won't get them done
by this Friday and I'm away for two weeks after that.
I've added a fairly minimal set of default arguments for now because
it's easier to add these things than to remove them. Happy to consider
adding new ones.
My next step is to shorten option names but it's a bigger change so may
not happen for a little while yet.
> Now the only remaining thing about osmosis that makes it somewhat
> clumsy is that it talks too much. If something goes wrong, you get 20+
> lines of error instead of one nice and concise message about what's
> wrong. And even if nothing goes wrong, you need at least one -q if you
> don't want to know what all those tasks are doing. But then again if
> you deliberately want to see progress info and include -lp, this gets
> eaten by -q. Sigh. Can't have it all I guess ;-)
>
Yeah agree, osmosis logging is poor as you've discovered :-) Much of
the problem is caused by using the jdk logging framework which is great
for writing re-usable libraries, but not so great for writing command
line applications. The logging framework is infinitely configurable but
exposing the configuration in a simple way on the command line is the
challenge.
I've added this to future_features.txt as well, not sure how I'm going
to solve it yet.
> Am I right in assuming that whenever you use output compression on a
> multi-CPU machine, --buffer is a probably good thing to do?
>
The simple answer is that in most cases adding a --buffer before the
final compressed output should improve performance.
It depends on how heavily the previous thread in the pipeline is loaded
(ie. whether previous tasks on same thread consume much cpu). I should
add info to the wiki indicating which tasks are active (run in their own
thread) and those which are passive (just respond to input data). There
is a tradeoff between adding more threads to utilise multi cpus and the
overhead of synchronising access to data between threads.
Cheers,
Brett
More information about the dev
mailing list