[Tile-serving] [osm2pgsql-dev/osm2pgsql] Some research on middle performance (Issue #2110)
Jochen Topf
notifications at github.com
Thu Jan 2 19:17:31 UTC 2025
I did some further tests:
All numbers reported here are for the ways table only. The simple COPY took 57 minutes. Tests were done using PostgreSQL 15. Supposedly there are some improvements in COPY performance in Pg 16, so we should also test with Pg 16 and/or 17.
## Using binary format
COPY can be done with the text format (that we currently use) or a binary format. Using binary format the COPY time drops to 40 minutes, so we are about 30% faster. The binary format is not well documented and the documentation says it might change, but it hasn't changed since PostgreSQL 7 as far as I can see, so it is unlikely that this will be a problem. More so because we are only writing into the database, not reading, so we have more control over the format than if we have to parse the format the database generates.
This should also save use some time on the osm2pgsql side, because generating the binary format is probably faster than the text format. For geometries which don't have to be hex encoded any more, the number of bytes transfered will be only half, that should also help. I can really see no downside, we should consider switching.
## Parallel COPYs
I tried simulating parallel COPYs by splitting a COPY file into 2 (or 4) pieces and doing 2 (or 4) copies simultaneously. This is with the text format. The import times were 33 minutes (or 23 minutes for 4 COPYs). It is hard to tell how well this would work in practice, but it is definitly worth a shot. The CPU usage for the postgres process doing the COPY went from 100% with one COPY to something like 70%, so it looks like we are not CPU-bound any more but either I/O bound or hitting some limits on locks or so.
This might make the data format on disk not so efficient though, maybe creating the index will take longer or usage of the resulting table is slightly slower.
## With FREEZE
Using COPY FREEZE the import takes 49 minutes, using FREEZE and binary format we are at 32 minutes, almost halfing the current time.
Unfortunately using COPY FREEZE does not work together with parallel COPYs, because you need to create the table in the same transaction that you do the COPY FREEZE in. I tried using [snapshot synchronization](https://www.postgresql.org/docs/17/functions-admin.html#FUNCTIONS-SNAPSHOT-SYNCHRONIZATION) to overcome this, but it didn't work. Maybe I didn't do it right, but there is probably something in there that prevents this from working, my transactions always got rolled back when I tried doing this.
Using COPY FREEZE would be simpler to implement that having multiple connections for parallel copies, so this is still something to consider.
## UNLOGGED TABLE
Somewhere on the Internet I found the suggestion to create the table as UNLOGGED TABLE, then do the import and the ALTER the table to LOGGED. This does not help. The COPY took about the same time and the ALTER TABLE took quite some time, so this is a dead end.
--
Reply to this email directly or view it on GitHub:
https://github.com/osm2pgsql-dev/osm2pgsql/issues/2110#issuecomment-2568252878
You are receiving this because you are subscribed to this thread.
Message ID: <osm2pgsql-dev/osm2pgsql/issues/2110/2568252878 at github.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/tile-serving/attachments/20250102/6c3b968e/attachment.htm>
More information about the Tile-serving
mailing list