[Tile-serving] [openstreetmap/osm2pgsql] High connection volume to Postgres (Discussion #1650)
Ryan Lambert
notifications at github.com
Fri Apr 15 02:45:45 UTC 2022
I started digging into this with the plan of adding pgbouncer, but didn't get very far. My first step was to understand what the connection load looked like on Postgres. Monitoring `pg_stat_activity` I found it holds open one idle connection per table for the bulk of the time osm2pgsql is running.
With the higher count of tables I use, this results in a sustained 40 idle connections. These connections are used at the very beginning and in the final steps of processing.

This chart doesn't show that each of those idle connection starts by running a handful of (typically) fast queries. The data for the chart was sampled every 5 seconds, but these queries all run in a fraction of a second so don't get captured. The first set of queries on each of the 40 connections all run within ~2 seconds. Here are the related Postgres logs from one PID at the startup (00:58:15).
```
2022-04-15 00:58:15 UTC [1995713]: [1-1] user=[unknown],db=[unknown],app=[unknown],client=172.17.0.4 LOG: connection received: host=172.17.0.4 port=38638
2022-04-15 00:58:15 UTC [1995713]: [2-1] user=mytest,db=pgosm_remote,app=[unknown],client=172.17.0.4 LOG: connection authorized: user=mytest database=pgosm_remote application_name=pgosm-flex
2022-04-15 00:58:15 UTC [1995713]: [3-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: statement: SET synchronous_commit = off
2022-04-15 00:58:15 UTC [1995713]: [4-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: duration: 0.206 ms
2022-04-15 00:58:15 UTC [1995713]: [5-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: statement: SET client_min_messages = WARNING
2022-04-15 00:58:15 UTC [1995713]: [6-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: duration: 0.031 ms
2022-04-15 00:58:15 UTC [1995713]: [7-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: statement: DROP TABLE IF EXISTS "osm"."road_line" CASCADE
2022-04-15 00:58:15 UTC [1995713]: [8-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: duration: 44.888 ms
2022-04-15 00:58:15 UTC [1995713]: [9-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: statement: DROP TABLE IF EXISTS "osm"."road_line_tmp"
2022-04-15 00:58:15 UTC [1995713]: [10-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: duration: 0.147 ms
2022-04-15 00:58:15 UTC [1995713]: [11-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: statement: RESET client_min_messages
2022-04-15 00:58:15 UTC [1995713]: [12-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: duration: 0.037 ms
2022-04-15 00:58:15 UTC [1995713]: [13-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: statement: CREATE UNLOGGED TABLE IF NOT EXISTS "osm"."road_line" ("osm_id" int8 NOT NULL,"osm_type" text NOT NULL,"name" text ,"ref" text ,"maxspeed" int4 ,"oneway" int2 ,"layer" int4 NOT NULL,"tunnel" text ,"bridge" text ,"major" boolean NOT NULL,"route_foot" boolean ,"route_cycle" boolean ,"route_motor" boolean ,"access" text ,"geom" Geometry(LINESTRING, 3857) ) WITH (autovacuum_enabled = off)
2022-04-15 00:58:15 UTC [1995713]: [14-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: duration: 9.084 ms
2022-04-15 00:58:15 UTC [1995713]: [15-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: statement: CREATE OR REPLACE FUNCTION "osm"."road_line_osm2pgsql_valid"()
2022-04-15 00:58:15 UTC [1995713]: [16-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: duration: 0.830 ms
2022-04-15 00:58:15 UTC [1995713]: [17-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: statement: CREATE TRIGGER "road_line_osm2pgsql_valid" BEFORE INSERT OR UPDATE ON "osm"."road_line" FOR EACH ROW EXECUTE PROCEDURE "osm"."road_line_osm2pgsql_valid"();
2022-04-15 00:58:15 UTC [1995713]: [18-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: duration: 0.264 ms
2022-04-15 00:58:15 UTC [1995713]: [19-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: statement: PREPARE get_wkb(bigint) AS SELECT "geom" FROM "osm"."road_line" WHERE "osm_id" = $1
2022-04-15 00:58:15 UTC [1995713]: [20-1]
```
After these initial queries only 1 connection is used to load data in to Postgres, while the other 40 (in my case) connections are held open.
The next time a query is ran from this connection is 8.5 minutes later (01:06:50) at the very end of processing.
```
2022-04-15 01:06:50 UTC [1995713]: [21-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: statement: DROP TRIGGER "road_line_osm2pgsql_valid" ON "osm"."road_line";
2022-04-15 01:06:50 UTC [1995713]: [22-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: duration: 0.688 ms
2022-04-15 01:06:50 UTC [1995713]: [23-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: statement: DROP FUNCTION IF EXISTS "osm"."road_line_osm2pgsql_valid" ();
2022-04-15 01:06:50 UTC [1995713]: [24-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: duration: 0.135 ms
2022-04-15 01:06:50 UTC [1995713]: [25-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: statement: SET client_min_messages = WARNING
2022-04-15 01:06:50 UTC [1995713]: [26-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: duration: 0.045 ms
2022-04-15 01:06:50 UTC [1995713]: [27-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: statement: CREATE TABLE IF NOT EXISTS "osm"."road_line_tmp" ("osm_id" int8 NOT NULL,"osm_type" text NOT NULL,"name" text ,"ref" text ,"maxspeed" int4 ,"oneway" int2 ,"layer" int4 NOT NULL,"tunnel" text ,"bridge" text ,"major" boolean NOT NULL,"route_foot" boolean ,"route_cycle" boolean ,"route_motor" boolean ,"access" text ,"geom" Geometry(LINESTRING, 3857) )
2022-04-15 01:06:50 UTC [1995713]: [28-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: duration: 1.669 ms
2022-04-15 01:06:50 UTC [1995713]: [29-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: statement: SELECT regexp_split_to_table(extversion, '\.') FROM pg_extension WHERE extname='postgis'
2022-04-15 01:06:50 UTC [1995713]: [30-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: duration: 0.676 ms
2022-04-15 01:06:50 UTC [1995713]: [31-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: statement: INSERT INTO "osm"."road_line_tmp" ("osm_id","osm_type","name","ref","maxspeed","oneway","layer","tunnel","bridge","major","route_foot","route_cycle","route_motor","access","geom") SELECT "osm_id","osm_type","name","ref","maxspeed","oneway","layer","tunnel","bridge","major","route_foot","route_cycle","route_motor","access","geom" FROM "osm"."road_line" ORDER BY geom
2022-04-15 01:06:55 UTC [1995713]: [32-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp1995713.0", size 230801408
2022-04-15 01:06:55 UTC [1995713]: [33-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 STATEMENT: INSERT INTO "osm"."road_line_tmp" ("osm_id","osm_type","name","ref","maxspeed","oneway","layer","tunnel","bridge","major","route_foot","route_cycle","route_motor","access","geom") SELECT "osm_id","osm_type","name","ref","maxspeed","oneway","layer","tunnel","bridge","major","route_foot","route_cycle","route_motor","access","geom" FROM "osm"."road_line" ORDER BY geom
2022-04-15 01:06:55 UTC [1995713]: [34-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: duration: 4953.350 ms
2022-04-15 01:06:55 UTC [1995713]: [35-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: statement: DROP TABLE "osm"."road_line"
2022-04-15 01:06:55 UTC [1995713]: [36-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: duration: 52.978 ms
2022-04-15 01:06:55 UTC [1995713]: [37-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: statement: ALTER TABLE "osm"."road_line_tmp" RENAME TO "road_line"
2022-04-15 01:06:55 UTC [1995713]: [38-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: duration: 0.322 ms
2022-04-15 01:06:55 UTC [1995713]: [39-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: statement: CREATE INDEX ON "osm"."road_line" USING GIST ("geom") WITH (fillfactor = 100)
2022-04-15 01:07:02 UTC [1995713]: [40-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: duration: 7530.942 ms
2022-04-15 01:07:02 UTC [1995713]: [41-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: statement: ANALYZE "osm"."road_line"
2022-04-15 01:07:03 UTC [1995713]: [42-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: duration: 376.104 ms
2022-04-15 01:07:03 UTC [1995713]: [43-1] user=mytest,db=pgosm_remote,app=pgosm-flex,client=172.17.0.4 LOG: disconnection: session time: 0:08:47.882 user=mytest database=pgosm_remote host=172.17.0.4 port=38638
```
The way these connections are held open prevents pgbouncer from doing its job. The default `session` mode of pgbouncer only returns connections to the pool when the application closes the connection. Since osm2pgsql holds the connections open (`idle`) they are not returned to the pool. This makes pgbouncer's session mode unusable for this, trying to run with a pool results in the first N connections holding all available connections in an idle state. Using the transaction mode doesn't appear to work either. I'm not sure if prepared statements are the only issue, but I'm pretty sure that alone would cause problems.
Closing the connection after the initial drop table / create table / etc and opening a new connection for the post-processing would be best from the Postgres perspective. That would allow pgbouncer to cycle the connections in the pool.
If there is a trick to getting pgbouncer to work with this, instructions in the docs would be helpful. I'd be happy to work up a PR with the instructions if someone can point me in the right direction.
--
Reply to this email directly or view it on GitHub:
https://github.com/openstreetmap/osm2pgsql/discussions/1650#discussioncomment-2570706
You are receiving this because you are subscribed to this thread.
Message ID: <openstreetmap/osm2pgsql/repo-discussions/1650/comments/2570706 at github.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/tile-serving/attachments/20220414/4418eeb9/attachment.htm>
More information about the Tile-serving
mailing list