[OSM-dev] OSM PBF and spatial characteristics of blocks

Wed Jan 6 15:00:37 UTC 2016

I think for exporting data of a given area, the cell based spatial splitting will outperform any database solution that treats individual geometries by an order of magnitude. But I think you are right about PBF being too simple, random access issues and updateability. 

I still think that a clever block handling would work well for both extracts and updates. 
What do you think about changing vex files to use offset pointers for it's data, and a fill factor of say 10-20 percent? 
So the initial cell size would be 20% larger on disk, change set data would mean to simply unlink the offset and append data to the current offset. When growing too large, a new file would be generated with again 20% additional space for change sets. 
Is there any data missing from the vex file format, or is all included from OSM pbf? 

Ben

Von meinem iPad gesendet

> Am 06.01.2016 um 15:14 schrieb Andrew Byrd <andrew at fastmail.net>:
> 
> 
>> On 06 Jan 2016, at 14:10, Stadin, Benjamin <Benjamin.Stadin at heidelberg-mobil.com> wrote:
>> 
>> And about the cell data: I'm considering to just reuse OSM pbf format, without preserving sort and size attributes. When exporting the data from individual grid cells, all data items will be streamed to the output ordered by type and ID. A simple in memory AVL tree should be sufficient (storing id keys and pointers to items as node data, iterating lowest to highest id on output)
> 
> We wanted to preserve conventional entity ordering (node, way, relation) but maintaining increasing ID number was not important for us; I preferred a constant-memory export process (i.e. memory consumption does not grow with the geographic size of the extract) that simply iterates over index cells in order three times, dumping first nodes, then ways, then relations.
> 
> If I understand you correctly you’d use the PBF format as your internal storage format, making one PBF file per spatial index cell (essentially splitting planet.pbf into one PBF file per tile). I can see the appeal of simplicity here, and I considered this approach myself, but I think PBF would be problematic if you intend to perform random access within those tiles to apply minutely updates. PBF is a data interchange format, to my knowledge designed and used primarily for moving or streaming database dumps or extracts from one site to another. You’ll end up doing a lot of decompress-filter-modify-rewrite operations on entire tiles. It could work, but it seems awkward and resource intensive. I can also imagine running into some problems with a 1 to N geographic PBF splitter. Due to PBF's block-based nature you might have to keep a prohibitively large number of files open simultaneously during your planet-to-tile splitter step. If the planet.pbf must pass through some intermediate representation to allow splitting (essentially a spatially indexed database of some kind), why not keep it in that intermediate representation and perform the spatial splitting on demand.
> 
> -Andrew
>