[OSM-dev] Duplicate data from Tiger import

Jon Burgess jburgess at uklinux.net
Tue Nov 28 01:21:03 GMT 2006


On Sun, 2006-11-26 at 22:24 +0000, Jon Burgess wrote:
> I noticed that Mapnik was taking much longer than normal to process some
> areas of the US and have found that there are instances where the same
> data is duplicated over 100 times. For example see:
> 
> http://www.openstreetmap.org/api/0.3/map?bbox=-84.316874,39.16047,-84.315683,39.161368
> 
> If this is displayed in JOSM there are only 5 distinct nodes and yet the
> raw XML shows that each of the nodes, segments and ways is duplicated
> 102 times. 
> 
> 
> I don't know whether this is a problem with the original tiger data or
> the import process, but it looks like something needs to be done to
> remove the redundant data. 
> 
> 	Jon
> 

Today I tried devising an enhanced osm2pgsql.c which would exclude
duplicate ways while generating the SQL. I've got something which seems
to work and indicates that around 60% of all nodes and ways in the
planet-061112 are duplicates. 

Once the duplicate entries are removed, the number of rows in planet_osm
drops from 3.6 to 1.5 million, which should improve the mapnik rendering
time.

I don't want to provide a copy of the code just yet. I'd still like to
make some improvements to it, such as improving the 2GB memory usage. I
also want to leave mapnik running overnight on the database to make sure
it the output still looks reasonable (and to measure the effect on
rendering times).


	Jon






More information about the dev mailing list