[OSM-dev] osm2pgsql diff imports benchmarks

Lynn W. Deffenbaugh (Mr) ldeffenb at homeside.to
Mon Mar 19 11:37:14 GMT 2012

On 3/18/2012 4:07 PM, Kai Krueger wrote:
> sylvain letuffe wrote
>> It turned out that the index on the "pending" field wasn't used at all and
>> led
>> to a full scan of the planet_osm_ways table, increasing noticeably the
>> diff
>> import.
> I think I have seen that one before as well. If I remember correctly, it was
> sufficient to run a simple analyze on the table. No need to do a re-index.
> It probably thinks a too large proportion of the ways are "pending" and
> therefore decides it is best to do a seq scan. Possibly because before the
> "going over pending ways" stage about 50% of ways are pending during the
> import. Osm2pgsql should be doing a analyze at the end of the import though,
> so I am not sure why this is happening.

I have a planet database that has been imported and has been attempting 
to catch up for a while now (3 days to go) and it's been showing extreme 
data throughput spikes when entering the "pending ways" phase, so I 
decided to check this out.  An analyze command show that a sequential 
scan is, in fact, being done:

> gis=> explain select id from planet_osm_ways where pending;
>  Seq Scan on planet_osm_ways  (cost=0.00..5253263.74 rows=65592587 
> width=4)
>    Filter: pending

I'm not sure why, but I got an error attempting to "analyze verbose 
planet_osm_ways" as www-data, but it seems to execute as the postgres user:

> gis=> analyze verbose planet_osm_ways;
> WARNING:  skipping "planet_osm_ways" --- only table or database owner 
> can analyze it

The analyze output really didn't tell me much:

> gis=# analyze verbose planet_osm_ways;
> INFO:  analyzing "public.planet_osm_ways"
> INFO:  "planet_osm_ways": scanned 30000 of 3941416 pages, containing 
> 983808 live rows and 589826 dead rows; 30000 rows in sample, 131170601 
> estimated total rows

But a subsequent explain certainly looks promising:

> gis=> explain select id from planet_osm_ways where pending;
>  Index Scan using planet_osm_ways_idx on planet_osm_ways  
> (cost=0.00..916513.78 rows=43724 width=4)

Now to wait for my next 12 hour catchup chunk to get to the pending ways 
phase,probably in about 6 hours or so.

Lynn (D) - KJ4ERJ

PS.  If anyone gets this far, are those "dead rows" a function of my 
running with autovaccuum off?  And should I be doing a periodic vaccuum 
to clean out accumulated cruft?  I was thinking that not too much in the 
osm planet would be deleting, but maybe my assumption is incorrect?

More information about the dev mailing list