[OSM-dev] pbf2osm development has started [code to test!]

Thu Sep 30 12:04:29 BST 2010

On Thu, 30 Sep 2010, Scott Crosby wrote:

> I proposed exactly this for the mkgmap splitter. If you're going to do this,
> can I propose a tweak where you can output thousands of files
> simultaneously? The differences are minor: Instead of tracking if a
> node/way/relation was output or was missed with two bitsets, track which
> areas it has been dumped to with two multimaps from the ID to the list of
> areas it was output to or the list of areas in which it was missed.

How would you a priori know if a node is in a certain area if you haven't 
observed its locatio or trace it?

> As the typical keycount in the multimap is low, you can use the trick I used
> a few weeks ago in the splitter: build a multimap by layering a set of
> individual hash tables, one for the first value of each key, a second for
> the second value of each key (if any), etc. Use sparse hash tables (
> http://code.google.com/p/google-sparsehash/) for the first few layers, and
> std::map for the remaining layers.

Bleggg... C++ :r

>>>      + based on the index
>>
>> The 'open index' that is not implemented should be implemented.
>>
>
> Ok. What kinds of things might we want to index?
>
> BBOX?
>
> Count of nodes/ways/relations in that block?
>
> What else?

I would find it very interesting if different types of output could be 
exported individually. For example being context aware. Some data is 
landuse, I don't need landuse for routing, so it might be exported in a 
completely different part of the pbf. So if the format would be 
descriptive about 'exclusive roads' that might also help the application 
that uses the data to extract or leave the set.

I don't think that counts are useful. The mbr is.

> Ok, Is XML's gzipped size or parsing speed a bottleneck for storing or
> processing changes? I'd be happy to offer suggestions on the protocol buffer
> architecture.

>From what I observe now the bottleneck seems to be actually protocol 
buffers, while my output code can become slightly faster.

Stefan