[OSM-dev] OSM PBF and spatial characteristics of blocks

Andrew Byrd andrew at fastmail.net
Wed Jan 6 14:14:53 UTC 2016


> On 06 Jan 2016, at 14:10, Stadin, Benjamin <Benjamin.Stadin at heidelberg-mobil.com> wrote:
> 
> And about the cell data: I'm considering to just reuse OSM pbf format, without preserving sort and size attributes. When exporting the data from individual grid cells, all data items will be streamed to the output ordered by type and ID. A simple in memory AVL tree should be sufficient (storing id keys and pointers to items as node data, iterating lowest to highest id on output)

We wanted to preserve conventional entity ordering (node, way, relation) but maintaining increasing ID number was not important for us; I preferred a constant-memory export process (i.e. memory consumption does not grow with the geographic size of the extract) that simply iterates over index cells in order three times, dumping first nodes, then ways, then relations.

If I understand you correctly you’d use the PBF format as your internal storage format, making one PBF file per spatial index cell (essentially splitting planet.pbf into one PBF file per tile). I can see the appeal of simplicity here, and I considered this approach myself, but I think PBF would be problematic if you intend to perform random access within those tiles to apply minutely updates. PBF is a data interchange format, to my knowledge designed and used primarily for moving or streaming database dumps or extracts from one site to another. You’ll end up doing a lot of decompress-filter-modify-rewrite operations on entire tiles. It could work, but it seems awkward and resource intensive. I can also imagine running into some problems with a 1 to N geographic PBF splitter. Due to PBF's block-based nature you might have to keep a prohibitively large number of files open simultaneously during your planet-to-tile splitter step. If the planet.pbf must pass through some intermediate representation to allow splitting (essentially a spatially indexed database of some kind), why not keep it in that intermediate representation and perform the spatial splitting on demand.

-Andrew




More information about the dev mailing list