[OSM-dev] OSM PBF and spatial characteristics of blocks

Wed Jan 6 12:41:41 UTC 2016

> On 06 Jan 2016, at 13:13, Andrew Byrd <andrew at fastmail.net> wrote:
> Obviously in the long term we’ll want to improve the index to handle these cases. Both of these limitations should be straightforward to overcome. To index large polygons as areas and you’d either need some kind of multi-level index (rectangle tree or “pyramids") or just accept rasterizing area polygons into all the index cells they overlap (a polygon’s ID appearing repeatedly, in every tile it overlaps).

I should elaborate: 

Using a predefined grid is practical because you don’t need to store the bounds of the index nodes (they’re regular tiles, so their bounding boxes are implicit in the grid definition). You can determine which index cell any element is in by just chopping low-order bits off its projected coordinates to match the index’s constant zoom level.

If you’re rendering map tiles, there’s also the advantage of a 1:1 correspondance between one spatial index node and one output map tile at or above the index’s zoom level, and a simple one-to-many relationship below that zoom level. This is also practical for exporting rectangular PBF regions.

If you want to index objects physically larger than a tile in the index or reflect the fact that objects span tiles, you could repeat the identifier of that object in every tile it touches (what I referred to as rasterizing the geometry), but this approach leads to a lot of repetition of identifiers and de-duplication of identifiers when you query the index.

An alternative approach is a hierarchical index, where each index node is entirely physically contained within a higher-level node. Conveniently the web Mercator map tile system defines a perfectly aligned hierarchy of squares via its zoom levels: it is a quad tree (https://en.wikipedia.org/wiki/Quadtree). You simply index each object at the highest zoom level that still contains the object entirely within one tile, and when you do a query operation from the index, you can easily traverse up the zoom levels without maintaining pointers between them (you can find the coordinates of the next higher tile by just chopping off low order bits).

There are of course many ways to implement spatial indexes, but considering that we’re working with data that often gets rendered or exported in tile-sized chunks, this approach seemed well-suited to me.

I’d of course welcome any insight or suggestions that might improve our implementation.

-Andrew