[Tile-serving] [osm2pgsql] large postgres shared_buffers slows down import (#163)

jeffjanes notifications at github.com
Sat Aug 30 06:53:43 UTC 2014


Setting shared_buffers is a very load-dependent (not to mention hardware and OS dependent) exercise, and probably for those reasons advice on it is often somewhat controversial as well.

While https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server could probably use some improvement (and I'm surprised that the recommendation to limit at 2 to 8 GB is not already in there, as it is so common), I think advice specific to loading data with osm2pgsql would be best suited to a site dedicated to osm2pgsql, as its workload is certainly not a typical one for PostgreSQL.

Anyway, the problem seems clear to me now.  If you are running PostgreSQL in minimal logging mode, when you use COPY to load data into a table which was created in the same transaction as the COPY, it uses an optimization where it doesn't WAL log the insertion, instead at the end of the COPY it calls heap_sync.  Usually this is an excellent trade-off, because COPY is usually used for large datasets and avoiding the WAL logging is very helpful.  

But osm2pgsql does this weird thing where, for populating polygons, it calls a separate COPY for each row, all in the same transaction.  Now for each row it has to scour the entire shared_buffers, and that obviously is not a good trade off.

There might be some way to improve this in PostgreSQL in some future version, but there are also several work-arounds in osm2pgsql.  As you found you could just lower the size of shared_buffers, which makes a lot of sense because that memory can probably be better used by other things (osm2pgsql itself, the file system cache, or maintenance_work_mem).  Or the creation of the polygon table could be committed before it starts getting used for COPY.  Or osm2pgsql could use INSERT rather than COPY.  Or it could just keep the COPY open and feed it data, rather than opening and closing it each time (but I assume it does it that way because it also wants to do queries on the same connection, and you can't interleave them).  Or you could increase the level of wal_level.

---
Reply to this email directly or view it on GitHub:
https://github.com/openstreetmap/osm2pgsql/issues/163#issuecomment-53950808
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/tile-serving/attachments/20140829/4c993b46/attachment.html>


More information about the Tile-serving mailing list