[Tile-serving] [openstreetmap/osm2pgsql] Tile Expiry Performance (#946)

Michael Reichert notifications at github.com
Wed Aug 14 21:59:25 UTC 2019


> @Nakaner I trust you look at some point into optimising the data structures a bit.

The tile expiry list is a `std::unordered_set<uint64_t>`. See the [comment in the header file](https://github.com/openstreetmap/osm2pgsql/blob/master/expire-tiles.hpp#L175) for an explanation. I don't have any other method to save memory in mind except reducing the accuracy of the tile expiry either

* by storing the tile IDs on a smaller zoom level than requested or
* by using `uint32_t`

The first leads to a smaller `unordered_set` because one tile on zoom *z* consists of four tiles on zoom level *z+1*. But a smaller memory footprint can be achieved by just requesting tile expiry on a lower zoom level only and converting the tile IDs to higher zoom levels using a simple script processing the tile expiry list. The second option limits the largest zoom level to 16. However, tile.openstreetmap.org does tile expiry on zoom level 19 (they don't use the tile expiry by Osm2pgsql). I am against limiting tile expiry because I have the impression that it is not the problem.

I have done a few tests for the last few weeks. These are my results. I was working with Osm2pgsql commit 90e17f0e8c793487ca39cbf95501cd9c5daa33e2 (9 June 2019).

The import:

```
node cache: stored: 5249458158(100.00%), storage efficiency: 83.16% (dense blocks: 723375, sparse nodes: 193165767), hit rate: 100.00%
        Command being timed: "/home/michael/osm2pgsql/build/osm2pgsql -s -d expirytest --cache 60000 --hstore --flat-nodes /nvme/michael/osm2pgsqltest/flatnodes-import.flat --number-processes 4 --style /home/michael/git/openstreetmap-carto/openstreetmap-carto.style --tag-transform-script /home/michael/git/openstreetmap-carto/openstreetmap-carto.lua planet-latest.osm.pbf"
        Elapsed (wall clock) time (h:mm:ss or m:ss): 14:10:01
        Maximum resident set size (kbytes): 64444052
```

The application of diffs (I reimported with the command provided above) after the application of each diff:

```
Diff of one day, deduplicated, compressed with Gzip:
Osm2pgsql took 1837s overall
node cache: stored: 3592308(100.00%), storage efficiency: 78.40% (dense blocks: 370, sparse nodes: 775373), hit rate: 3.33%
        Command being timed: "/home/michael/osm2pgsql/build/osm2pgsql --append -s -d expirytest --cache 3000 --hstore joined-diff/joined-diff.osc.gz --flat-nodes /nvme/michael/osm2pgsqltest/flatnodes-import.flat --slim --number-processes 4 --style /home/michael/git/openstreetmap-carto/openstreetmap-carto.style --tag-transform-script /home/michael/git/openstreetmap-carto/openstreetmap-carto.lua -e10-16 -o joined-diff/expiries-gzip.txt"
        Elapsed (wall clock) time (h:mm:ss or m:ss): 30:41.98
        Maximum resident set size (kbytes): 50633788

the same, zoom levels 10-12 only:
about 695000 lines in the tile expiry file
        Command being timed: "/home/michael/osm2pgsql/build/osm2pgsql --append -s -d expirytest --cache 3000 --hstore joined-diff/joined-diff.osc.gz --flat-nodes /nvme/michael/osm2pgsqltest/flatnodes-import.flat --slim --number-processes 4 --style /home/michael/git/openstreetmap-carto/openstreetmap-carto.style --tag-transform-script /home/michael/git/openstreetmap-carto/openstreetmap-carto.lua -e10-12 -o joined-diff/expiries-gzip-10-12.txt"
        Elapsed (wall clock) time (h:mm:ss or m:ss): 29:07.44
        Maximum resident set size (kbytes): 49466208
```

Reducing the maximum zoom level of the tile expiry reduces the number of affected tiles because the set stores the tile in the largest zoom level only (lower zoom levels can be retrieved by bit shifts to the right). However, the influence on the total memory consumption is limited.

The lowest zoom level has no influence on the memory consumption because the tile IDs on all zoom levels below the largest requested zoom level are [created on the fly](https://github.com/openstreetmap/osm2pgsql/blob/master/expire-tiles.hpp#L95) during writing the output file.

```
Diff of one day, deduplicated, compressed with Gzip, zoom levels 15 and 16 only:
node cache: stored: 3592308(100.00%), storage efficiency: 78.40% (dense blocks: 370, sparse nodes: 775373), hit rate: 3.32%
        Command being timed: "/home/michael/osm2pgsql/build/osm2pgsql --append -s -d expirytest --cache 3000 --hstore joined-diff/joined-diff.osc.gz --flat-nodes /nvme/michael/osm2pgsqltest/flatnodes-import.flat --slim --number-processes 4 --style /home/michael/git/openstreetmap-carto/openstreetmap-carto.style --tag-transform-script /home/michael/git/openstreetmap-carto/openstreetmap-carto.lua -e15-16 -o joined-diff/expiries-gzip-15-16.txt"
        Elapsed (wall clock) time (h:mm:ss or m:ss): 33:40.85
        Maximum resident set size (kbytes): 50682908
```

This proves my statement the the largest requested zoom level decides how much memory the tile expiry list needs.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/openstreetmap/osm2pgsql/issues/946#issuecomment-521435143
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/tile-serving/attachments/20190814/7a3b5c3e/attachment-0001.html>


More information about the Tile-serving mailing list