[Tile-serving] [osm2pgsql] Experimental ID tracker implementation. (aa1ff6e)
Matt Amos
notifications at github.com
Wed Apr 29 11:58:10 UTC 2015
It's still deterministic; this change just altered the implementation from being based on a `std::set<osmid_t>` to being effectively based on a `vector<bool>` with some optimisations. I'm not sure whether there are any comments describing this, so please excuse me while I do a bit of a brain dump here:
Background: the ID tracker is used to temporarily (and ephemerally) store whether or not a particular element needs to be re-visited later in the process. In osm2pgsql terms, whether an element is marked as _pending_ or not. Two versions previously, this boolean was stored in the database, as a "pending" column in the per-element-type tables.
The ID-tracker: There is no need to store "pending" bits during a planet load, as all elements are pending. Also, only a small number of elements become pending when loading a diff (hand-wavingly proportional to the size of the diff, not the size of the planet). Storing the pending information in the database has some overhead in terms of the database mechanism, which will persist the data to disk, transactionally, although this isn't required for the ephemeral pending information, and requires an index over the pending column which has some cost. Therefore, it seemed sensible to take the tracking of pending bits out of the database and just track it in memory. The first implementation of this used the standard container `std::set<osmid_t>`, but since this is effectively a binary tree, it has a large overhead in terms of memory and does a lot of pointer chasing. So...
The "experimental" ID-tracker: The way the ID-tracker is used requires the ability to add, remove and iterate over pending IDs in order. Given that the `std::set` implementation worked, but had too high a cost, it seemed sensible to optimise by reducing the height of the tree and collapsing single leaf nodes into blocks of leaves. Each `block` stores a continuous range of bits indicating the pending status of that range of IDs, and these are indexed using the lower order bits of the ID. The upper order bits of the ID are used as the key in the `std::map<osmid_t, block>`, which allows the tracker bitmask to be sparse. As the comment preceding `block` mentions, this was initially just a `vector<bool>` which would have done the job just as well, if not for the cost of the `iterator`s - so instead it was minimally re-implemented as a plain array, with the role of the `iterator` replaced by the `next_set` function.
---
Reply to this email directly or view it on GitHub:
https://github.com/openstreetmap/osm2pgsql/commit/aa1ff6e5d07b7a356bc88093d2673fc7929c2133#commitcomment-10961958
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/tile-serving/attachments/20150429/ec97caa3/attachment.html>
More information about the Tile-serving
mailing list