[OSM-dev] Postgres 8.4/PostGIS 1.4 performance

Frederik Ramm frederik at remote.org
Tue Sep 15 10:42:46 BST 2009


Hi,

(taking this back onto dev)

Jon Burgess wrote:
> It sounds odd that they would be quite so wrong. The osm2pgsql code
> performs an ANALYZE as part of the processing. 

Something must be wrong there then. The osm2pgsql import with 8.3 has 
now completed, again needing 1800 minutes compared to the 2000 needed by 
the 8.4 version. The pg_class table, again, listed the nodes as having 
relpages=0. Then I did a simple "analyze" which took a minute or two, 
and afterwards I got sensible results:

For 8.3 with osm2pgsql 0.67, slim mode and -l:

          relname          |    size    | reltuples | relpages
--------------------------+------------+-----------+----------
  planet_osm_ways_nodes    | 20 GB      |  32636734 |  2577003
  planet_osm_nodes         | 18 GB      | 420728864 |  2333748
  planet_osm_nodes_pkey    | 9007 MB    | 420728864 |  1152902
  planet_osm_line          | 7105 MB    |  27390248 |   909401
  planet_osm_ways          | 6373 MB    |  32636734 |   815762
  planet_osm_line_index    | 1613 MB    |  27390248 |   206460
  planet_osm_polygon       | 1248 MB    |   4276819 |   159700
  planet_osm_ways_pkey     | 1002 MB    |  32636734 |   128192
  planet_osm_roads         | 800 MB     |   2496160 |   102457
  planet_osm_point         | 615 MB     |   7954021 |    78728
  planet_osm_line_pkey     | 587 MB     |  27390248 |    75093
  planet_osm_point_index   | 464 MB     |   7954021 |    59423
  planet_osm_polygon_index | 260 MB     |   4276819 |    33337
  planet_osm_point_pkey    | 171 MB     |   7954021 |    21825
  planet_osm_roads_index   | 151 MB     |   2496160 |    19373
  planet_osm_ways_idx      | 113 MB     |         0 |    14499
  planet_osm_polygon_pkey  | 92 MB      |   4276819 |    11735
  planet_osm_rels_parts    | 85 MB      |    171435 |    10904
  planet_osm_rels          | 84 MB      |    171435 |    10727
  planet_osm_roads_pkey    | 54 MB      |   2496160 |     6929
  planet_osm_rels_pkey     | 3784 kB    |    171435 |      473
  planet_osm_rels_idx      | 8192 bytes |         0 |        1
(22 rows)

For 8.4 with the same parameters (this time after "analyze"):

          relname          |    size    | reltuples | relpages
--------------------------+------------+-----------+----------
  planet_osm_ways_nodes    | 18 GB      |  32352920 |  2406878
  planet_osm_nodes         | 18 GB      | 420052384 |  2333748
  planet_osm_nodes_pkey    | 9007 MB    | 420052384 |  1152902
  planet_osm_line          | 7104 MB    |  27426062 |   909316
  planet_osm_ways          | 6364 MB    |  32352920 |   814552
  planet_osm_line_index    | 1607 MB    |  27426062 |   205720
  planet_osm_polygon       | 1247 MB    |   4286884 |   159647
  planet_osm_ways_pkey     | 1001 MB    |  32352920 |   128190
  planet_osm_roads         | 800 MB     |   2525946 |   102446
  planet_osm_point         | 615 MB     |   7954690 |    78728
  planet_osm_line_pkey     | 587 MB     |  27426062 |    75093
  planet_osm_point_index   | 464 MB     |   7954690 |    59423
  planet_osm_polygon_index | 260 MB     |   4286884 |    33337
  planet_osm_point_pkey    | 171 MB     |   7954690 |    21825
  planet_osm_roads_index   | 151 MB     |   2525946 |    19373
  planet_osm_ways_idx      | 113 MB     |         0 |    14499
  planet_osm_polygon_pkey  | 92 MB      |   4286884 |    11735
  planet_osm_rels          | 84 MB      |    171433 |    10727
  planet_osm_rels_parts    | 67 MB      |    171433 |     8545
  planet_osm_roads_pkey    | 54 MB      |   2525946 |     6929
  planet_osm_rels_pkey     | 3784 kB    |    171433 |      473
  planet_osm_rels_idx      | 8192 bytes |         0 |        1
(22 rows)

Now the two tables create more questions than they answer. Why does the 
node table under 8.4 seem to have less entries than under 8.3 when both 
have been populated with the same planet file?

osm=# select max(id) from planet_osm_nodes;
     max
-----------
  480684210

same for both databases.

osm=# select count(*) from planet_osm_nodes;
    count
-----------
  420480680

also the same for both. Which means that 8.3 over- and 8.4 
under-estimates the actual number in the pg_class listing.

Maybe the 18 GB versus 20 GB is just an estimate as well because I 
cannot seem to find a difference in node storage size:

osm=# select id,pg_column_size(planet_osm_nodes.*) from planet_osm_nodes 
where  id in (1,2,3,4,5,49786136,49786237,50052568,51308246,52779935);
     id    | pg_column_size
----------+----------------
         2 |            109
         3 |             36
         4 |             36
         5 |             36
  49786136 |            236
  49786237 |            240
  50052568 |            276
  51308246 |            208
  52779935 |            204

Same for 8.3 and 8.4.

osm=# select sum(pg_column_size(planet_osm_nodes.*)) from 
planet_osm_nodes where id > 400000000;
     sum
------------
  2622633489

Same too.

But then the disk usage for 8.3:

# du  8.3/main
328	8.3/main/global
4	8.3/main/pg_xlog/archive_status
688976	8.3/main/pg_xlog
12	8.3/main/pg_clog
12	8.3/main/pg_subtrans
4	8.3/main/pg_twophase
12	8.3/main/pg_multixact/members
12	8.3/main/pg_multixact/offsets
28	8.3/main/pg_multixact
4352	8.3/main/base/1
4352	8.3/main/base/11510
4448	8.3/main/base/11511
73569488	8.3/main/base/16385
4	8.3/main/base/pgsql_tmp
73582648	8.3/main/base
4	8.3/main/pg_tblspc
74272028	8.3/main

and 8.4:

# du  8.4/main
488	8.4/main/global
4	8.4/main/pg_xlog/archive_status
688976	8.4/main/pg_xlog
12	8.4/main/pg_clog
12	8.4/main/pg_subtrans
4	8.4/main/pg_twophase
12	8.4/main/pg_multixact/members
12	8.4/main/pg_multixact/offsets
28	8.4/main/pg_multixact
5484	8.4/main/base/1
5484	8.4/main/base/11563
5592	8.4/main/base/11564
72235152	8.4/main/base/16384
4	8.4/main/base/pgsql_tmp
72251720	8.4/main/base
4	8.4/main/pg_tblspc
4	8.4/main/pg_stat_tmp
72941260	8.4/main

There seems to be some truth about the 18 GB vs 20 GB figure; 8.4 seems 
to save a few GB somewhere, and my suspicion is that for some reason 
this saving costs some time during import, making for the 2000 vs 1800 
minutes.

Bye
Frederik




More information about the dev mailing list