[Tile-serving] [osm2pgsql-dev/osm2pgsql] Possible improvements for new ram middle (#1466)

Thu May 15 08:00:32 UTC 2025

joto left a comment (osm2pgsql-dev/osm2pgsql#1466)

Here are some additional considerations regarding memory efficiency for the node location store:

The current RAM middle implements the node location store roughly as this: Locations are stored in blocks of 32 nodes. Inside each block the ids of nodes and the x and y coordinates are stored delta encoded and using varints. Overall we need about 8 to 9 bytes per stored node for this, for larger files (planet) it is a bit less, for smaller files a bit more.

This doesn't look that great. The naive implementation (which we use for the on disk flat node files) uses 8 bytes x the highest node id. So for the planet file we are really close to that. Of course for smaller files the RAM middle is much more efficient because it only needs the 8 bytes for actually stored nodes and does not depend on the highest node id.

Can we find a more efficient encoding than the delta/varint encoding? What makes the delta/varint work reasonably well is that we have runs of nodes that are really close to each other, because they were created together for OSM ways. But they are not that close either, so varint might not be the best encoding. And we don't always have these runs of close nodes. And varint encoding for essentially random numbers does not work well.

We should explore other encoding options. One possible idea is to encode the x and y coordinates together, interleaving the bits from the x and y coordinates, like the index in the main OSM database does it. Then we can store the higher order bits that are the same for all (or many) of the nodes in a block separately from the lower order bits. This could be done either with a fixed or a dynamic separation between those higher/lower order bits. Some back-of-the-envelope calculation shows that this might save some space, but we'd need to implement it to see whether that's the case with actual data.

We could also have different implementations for the blocks depending on the node locations. For each block the most efficient encoding could be chosen, similar to how compression engines sometimes work. Of course this adds more fixed overhead to store that information...

-- 
Reply to this email directly or view it on GitHub:
https://github.com/osm2pgsql-dev/osm2pgsql/issues/1466#issuecomment-2882929246
You are receiving this because you are subscribed to this thread.

Message ID: <osm2pgsql-dev/osm2pgsql/issues/1466/2882929246 at github.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/tile-serving/attachments/20250515/793f2ee6/attachment.htm>