[Tile-serving] [openstreetmap/osm2pgsql] Confusing middle pgsql processing. (Issue #2049)

Marcos Dione notifications at github.com
Tue Aug 22 13:54:59 UTC 2023


<!-- Please don't use screenshots. Copy and paste the *text* output here if that's needed for context. -->

## What version of osm2pgsql are you using?

`osm2pgsql version 1.8.0`

## What operating system and PostgreSQL/PostGIS version are you using?

Linux Debian Stable, PG Database version: `15.3 (Debian 15.3-0+deb12u1)`, PostGIS version: `3.3`

## Tell us something about your system

Small server, 8GiB RAM, 1TB consumer SATA SSD.

## What did you do exactly?

`osm2pgsql --verbose --database gis --cache 0 --number-processes 4 --slim --flat-nodes $(pwd)/nodes.cache --hstore --multi-geometry --style $osm_carto/openstreetmap-carto.style --tag-transform-script $osm_carto/openstreetmap-carto.lua europe-latest.osm.pbf`

## What did you expect to happen?

The logs about the middle processing are a little bit confusing:

```
2023-08-19 20:49:52  [1] Clustering table 'planet_osm_polygon' by geometry...
2023-08-19 20:49:52  [2] Clustering table 'planet_osm_line' by geometry...
2023-08-19 20:49:52  [3] Clustering table 'planet_osm_roads' by geometry...
2023-08-19 20:49:52  [4] Clustering table 'planet_osm_point' by geometry...

2023-08-19 20:49:52  [1] Using native order for clustering table 'planet_osm_polygon'
2023-08-19 20:49:52  [2] Using native order for clustering table 'planet_osm_line'
2023-08-19 20:49:52  [3] Using native order for clustering table 'planet_osm_roads'
2023-08-19 20:49:52  [4] Using native order for clustering table 'planet_osm_point'

2023-08-19 22:35:50  [3] Creating geometry index on table 'planet_osm_roads'...
2023-08-19 22:50:47  [3] Creating osm_id index on table 'planet_osm_roads'...
2023-08-19 22:55:52  [3] Analyzing table 'planet_osm_roads'...
2023-08-19 22:57:47  [3] Done task in 7674389ms.
2023-08-19 22:57:47  [3] Starting task...
2023-08-19 22:57:47  [3] Done task in 1ms.
2023-08-19 22:57:47  [3] Starting task...

2023-08-19 22:57:47  [0] Done postprocessing on table 'planet_osm_nodes' in 0s

2023-08-19 22:57:47  [3] Building index on table 'planet_osm_ways'

2023-08-19 23:32:06  [4] Creating geometry index on table 'planet_osm_point'...
2023-08-20 00:13:30  [4] Creating osm_id index on table 'planet_osm_point'...
2023-08-20 00:20:35  [4] Analyzing table 'planet_osm_point'...
2023-08-20 00:20:40  [4] Done task in 12647156ms.
2023-08-20 00:20:40  [4] Starting task...

2023-08-20 00:20:40  [4] Building index on table 'planet_osm_rels'
2023-08-20 02:03:11  [4] Done task in 6151838ms.

2023-08-20 03:17:24  [2] Creating geometry index on table 'planet_osm_line'...
2023-08-20 03:54:40  [2] Creating osm_id index on table 'planet_osm_line'...
2023-08-20 04:02:57  [2] Analyzing table 'planet_osm_line'...
2023-08-20 04:03:01  [2] Done task in 25988218ms.

2023-08-20 05:26:21  [1] Creating geometry index on table 'planet_osm_polygon'...
2023-08-20 06:17:31  [1] Creating osm_id index on table 'planet_osm_polygon'...
2023-08-20 06:30:46  [1] Analyzing table 'planet_osm_polygon'...
2023-08-20 06:30:47  [1] Done task in 34854542ms.

2023-08-20 10:48:18  [3] Done task in 42630605ms.

2023-08-20 10:48:18  [0] Done postprocessing on table 'planet_osm_ways' in 42630s (11h 50m 30s)
2023-08-20 10:48:18  [0] Done postprocessing on table 'planet_osm_rels' in 6151s (1h 42m 31s)
```

I separated it in sections by worker. The messages about `Starting task` and `Done task` don't mention which task it is. Also, do I understand correctly that not all workers started working at the same time? If I'm right, why is that? If I Isolate worker 3's logs:

```
2023-08-19 20:49:52  [3] Clustering table 'planet_osm_roads' by geometry...
2023-08-19 20:49:52  [3] Using native order for clustering table 'planet_osm_roads'
2023-08-19 22:35:50  [3] Creating geometry index on table 'planet_osm_roads'...
2023-08-19 22:50:47  [3] Creating osm_id index on table 'planet_osm_roads'...
2023-08-19 22:55:52  [3] Analyzing table 'planet_osm_roads'...
2023-08-19 22:57:47  [3] Done task in 7674389ms.
2023-08-19 22:57:47  [3] Starting task...
2023-08-19 22:57:47  [3] Done task in 1ms.
2023-08-19 22:57:47  [3] Starting task...
2023-08-19 22:57:47  [3] Building index on table 'planet_osm_ways'
2023-08-20 10:48:18  [3] Done task in 42630605ms.
2023-08-20 10:48:18  [0] Done postprocessing on table 'planet_osm_ways' in 42630s (11h 50m 30s)
```

At 20:49 starts processing `roads`. ~2h later it's creating 2 indexes and analyzing it. Launches 2 tasks, but finishes one first (22:57), a long one? Which one? And then another, but one seems to still be running.

And suddenly it's creating indexes on `ways`. This seems to finish the next day and it's properly logged.

I also see the messages about 'Done postprocessing' for `nodes` (22:57), `ways` and `rels` (next day, 10:48), but not for `roads`. I __do__ get all 4 `All postprocessing' messages.

Could tasks be more verbose about what exactly they're doing? I didn't try `--log-level=debug` because this is a 49h+ process and I really don't want to start it all over :)

-- 
Reply to this email directly or view it on GitHub:
https://github.com/openstreetmap/osm2pgsql/issues/2049
You are receiving this because you are subscribed to this thread.

Message ID: <openstreetmap/osm2pgsql/issues/2049 at github.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/tile-serving/attachments/20230822/4a674e51/attachment.htm>


More information about the Tile-serving mailing list