[Tile-serving] [osm2pgsql-dev/osm2pgsql] parallelize the COPY phase (Discussion #2426)

Thu Oct 30 07:08:35 UTC 2025

> the end the bottle neck is probably the I/O isn't it? And doing more of this in parallel means more contention on the WAL and, if we are writing to the same table in multiple COPYs at once, more contention an that table.

For the Postgres side generally multiple connections doing COPY at the same time will be faster. WAL contention doesn't come in to play at all for the output tables because they're `UNLOGGED` at that stage and no WAL exists for them. Even for the slim tables, we have synchronous_commit off so very few fsyncs are issued.

If it is IO throughput (MB/s or IOPS) limited you want it very parallel. Modern NVMe SSDs scale with queue depth. Manufacturer spec sheets use queue depths of [128+](https://assets.micron.com/adobe/assets/urn:aaid:aem:a25db53e-784a-411c-ab92-dbcbc8f8b268/renditions/original/as/7500-ssd-tech-prod-spec.pdf) for random workloads. Sequential seems to be lower with one source quoting at 32. Even back in 2015 [Intel was showing the best performance was at queue depths >100](https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/performance-pcie-nvme-enterprise-ssds-white-paper.pdf).

-- 
Reply to this email directly or view it on GitHub:
https://github.com/osm2pgsql-dev/osm2pgsql/discussions/2426#discussioncomment-14823273
You are receiving this because you are subscribed to this thread.

Message ID: <osm2pgsql-dev/osm2pgsql/repo-discussions/2426/comments/14823273 at github.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/tile-serving/attachments/20251030/0b581334/attachment.htm>