[Tile-serving] [osm2pgsql-dev/osm2pgsql] parallelize the COPY phase (Discussion #2426)
Tomas Vondra
notifications at github.com
Wed Oct 29 13:36:37 UTC 2025
I don't have great data, it's mostly based on watching "top" while the osm2pgsql is running. And most of the time there's just a single backend doing COPY and consuming 100% of the time. Like this:
```
PID %CPU %MEM TIME+ COMMAND
41997 123.5 10.7 19:15.48 osm2pgsql --drop -c --verbose --log-level debug -k -H localhost -d osm planet-251020.osm.pbf
42007 100.0 0.1 7:09.87 postgres: azureuser osm ::1(41384) COPY
1 0.0 0.0 0:03.42 /lib/systemd/systemd --system --deserialize=27
2 0.0 0.0 0:00.02 [kthreadd]
```
I'm sure there are periods when it really is I/O bound, but this is clearly CPU bound. Processing COPY is not exactly free, and most of a perf profile is related to parsing the input, forming tuples, etc. That should parallelize pretty well, I think.
I don't think WAL contention, or contention on the relation would be a problem. It's a strategy we often use when generating large amounts of data for testing, and it works great. Of course, it assumes it does not get I/O bound (particularly on WAL). Sure, if the storage can't handle that, you won't get an improvement. But parallelism is meant to help "good" systems that don't have this bottleneck. (I'm testing this on a VM with 400GB of RAM and 6 NVMe drives in RAID0. It really is not I/O bound.)
Also, these are bulk WAL writes - large sequential writes, with very few fsyncs. So the system won't wait for the WAL all that much anyway. This is what strace tells me for the COPY backend:
```
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
97.13 0.998593 4 239801 215 recvfrom
1.78 0.018331 1 13106 pwrite64
0.37 0.003829 0 5774 lseek
0.37 0.003827 11 327 pwritev
0.28 0.002916 1 1919 fallocate
0.02 0.000219 2 95 66 openat
0.01 0.000144 0 218 sendto
0.01 0.000108 0 144 pread64
0.01 0.000071 0 212 epoll_wait
0.00 0.000049 1 28 close
0.00 0.000014 4 3 rt_sigreturn
0.00 0.000012 4 3 getpid
0.00 0.000007 2 3 setitimer
0.00 0.000000 0 1 kill
------ ----------- ----------- --------- --------- ----------------
100.00 1.028120 3 261634 281 total
```
There's not a single fsync, it's all about reading data from the connection, and writing pages to disk.
Still, I may be wrong. I know a thing or two about Postgres, but I'm not all that familiar with OSM or osm2pgsql code. I only use it to evaluate Postgres improvement, etc. I won't be able to improve osm2pgsl myself (say, by adjusting the code to use a thread pool), but I'll be able to test / evaluate a patch if someone prepares one.
--
Reply to this email directly or view it on GitHub:
https://github.com/osm2pgsql-dev/osm2pgsql/discussions/2426#discussioncomment-14816357
You are receiving this because you are subscribed to this thread.
Message ID: <osm2pgsql-dev/osm2pgsql/repo-discussions/2426/comments/14816357 at github.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/tile-serving/attachments/20251029/0e72b16a/attachment.htm>
More information about the Tile-serving
mailing list