[OSM-dev] Osmosis: Subtiling the planet with nice ways proposal

Tue Jan 15 00:43:40 GMT 2008

Lambertus wrote:
> Hi Brett,
>
> Lets see if I can put this down nice and short: I've had a little chat 
> on the IRC today about splitting the planet file into subtiles for my 
> Garmin endeavors. I complained about missing ways where they cross 
> tile boundaries because of how the bboxes are extracted from the 
> planet file and that this is quite a showstopper for the Garmin maps.
>
> Now, this could be solved using the completeWays option but that is 
> not useable when processing the whole planet file. Also the ways do 
> not stop at the edges of the tiles. So RichardF (et al) came up with 
> an idea where large tiles were extracted which would be post-processed 
> to produce subtiles with ways neatly cut on the edges. This would 
> lower the amount of RAM needed to maintain a list of nodes and ways 
> that are involved significantly, resulting in a workable solution. We 
> discussed some more about this and came to the conclusion 
> (Bobkare/Kleptog) that it might be a nice addition to Osmosis to add 
> this as a new feature. How about that, eh?
>
> Attached is the (long but condensed) IRC log which shows the idea (and 
> some spin-offs) as described above in some more detail (although in 
> bits and pieces).
>
> What d'you reckon? Is this something for Osmosis?
>
> Regards, Lambertus
I'd like to see this problem solved.  As discussed elsewhere it's 
tricky, and complicated by the size of data nowadays.

As Karl mentioned in another email, I'm currently working on support for 
"datasets".  Basically this is a way of exposing a database through the 
pipeline instead of forcing tasks to process data in a stream.  It gives 
a task a way of directly accessing individual entities (by id) and 
reading bounding box contents.

If I can get a fast dataset implementation working, the tiling tasks 
should become relatively straightforward.

The biggest problem is the speed of an import.  A decent implementation 
with spatially aware way storage takes a long time to load.  Applying 
changes to a storage implementation is one way of alleviating it but 
that requires a complex storage mechanism.

I'm working on two storage implementations at the moment, a bdb backed 
implementation and a custom file based implementation.  The final 
fallback is to use something like postgis as the storage implementation 
but I was hoping to avoid that due to the complexity of setting up the 
infrastructure.

bdb is currently faster and with some effort should be possible to apply 
changes to it to keep it up to date.  The downside is that it consumes 
an enormous amount of space (approx twice the size of the uncompressed 
planet :-( ).
My custom file based approach is far more disk efficient (approx half of 
uncompressed planet) but is currently slow.  However I've already 
discovered some techniques that should speed it up considerably 
(caching, index data locality, and file buffering).  I'll never be able 
to apply changes efficiently though, that's a hard problem to solve, it 
will always need to be built from a complete planet.
I'd like to create a postgis implementation for comparison but 1. I 
don't know postgis and 2. It will take a lot of time.

I need some more time to play with this though, it's proving to be very 
time consuming.  If dataset support turns out to be useless then I'll 
need to have a rethink.  Karl might already have some other ideas given 
that he's spent more time on it.