[Tile-serving] [openstreetmap/osm2pgsql] Slow --append node processing with osm2pgsql 1.8.1 (7-8* slower than older setup) (Discussion #1971)

Jochen Topf notifications at github.com
Thu Jun 15 07:47:03 UTC 2023


Okay, first to get some stuff out of the way that's probably not the problem:

* You already analyzed the threads thing correctly.
* LuaJIT is stuck on an older version because there are no new releases, but Lua is pretty stable and the differences between versions are small. LuaJIT should still be noticably faster (about 10% to 15%) then Lua without JIT. But you will only see that speedup on imports not on updates, because updates are much slower for other reasons, see below.

Now something which might be a problem: You are using quite a large node cache (`-C 64000`), that's 64 GB. Depending on how much memory you have this could actually be counterproductive. You are using a flat node file anyway, so I don't think using the cache will help, it will probably hurt, because you use up a lot of your memory that could otherwise be used for disk buffers. (The docs aren't very clear about when and how to use the cache option, it's on my todo list to improve that.) So you should try without `-C` or even disable the cache with `-C 0`.

And finally: Where are those changes coming from? Your changes look rather large (44 mio nodes), probably something like two weeks of changes in OSM or so. If you work with changes that large you are almost certainly better off updating the data file (with osmium or so) and then doing a full re-import in non-slim mode every time you want to update. Updates have always been several orders of magnitude slower than imports, so working with updates really only makes sense if you need to keep up-to-date with minutely or, at most, daily diffs.

I tried an import/update with settings similar to yours (but without the cache) and I am getting 3k/s for the nodes. That's not great but it is in line with what I'd expect and what you mention you had before. Of course these numbers are not really comparable between systems, but we are only talking about the order of magnitude here. Now what is interesting is looking at where this time goes. One thing I measured was the time it took to COPY blocks of data into the `planet_osm_points` table. You can see this here (time in seconds):

![chart-copy-to-point-table](https://github.com/openstreetmap/osm2pgsql/assets/113756/fea16451-7251-487d-8022-9874368677bb)

You can clearly see two phases, in the first phase we are probably updating existing nodes which takes more and more time for each block. I don't know why that is and need to investigate further, but it supports my argument from above that you want to avoid large changes. If you absolutely have to work with changes it might be better to feed them in in smaller chunks. In the second phase new nodes come in and the speed is okay again.

There is definitely a lot of room for improving osm2pgsql here. But updating with large change files is, as I said, not recommended anyway and will always be slower than an import. That's why this use case doesn't have a high priority.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/openstreetmap/osm2pgsql/discussions/1971#discussioncomment-6183037
You are receiving this because you are subscribed to this thread.

Message ID: <openstreetmap/osm2pgsql/repo-discussions/1971/comments/6183037 at github.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/tile-serving/attachments/20230615/dffac01b/attachment.htm>


More information about the Tile-serving mailing list