[OSM-dev] Duplicate data from Tiger import
jburgess at uklinux.net
Tue Nov 28 01:21:03 GMT 2006
On Sun, 2006-11-26 at 22:24 +0000, Jon Burgess wrote:
> I noticed that Mapnik was taking much longer than normal to process some
> areas of the US and have found that there are instances where the same
> data is duplicated over 100 times. For example see:
> If this is displayed in JOSM there are only 5 distinct nodes and yet the
> raw XML shows that each of the nodes, segments and ways is duplicated
> 102 times.
> I don't know whether this is a problem with the original tiger data or
> the import process, but it looks like something needs to be done to
> remove the redundant data.
Today I tried devising an enhanced osm2pgsql.c which would exclude
duplicate ways while generating the SQL. I've got something which seems
to work and indicates that around 60% of all nodes and ways in the
planet-061112 are duplicates.
Once the duplicate entries are removed, the number of rows in planet_osm
drops from 3.6 to 1.5 million, which should improve the mapnik rendering
I don't want to provide a copy of the code just yet. I'd still like to
make some improvements to it, such as improving the 2GB memory usage. I
also want to leave mapnik running overnight on the database to make sure
it the output still looks reasonable (and to measure the effect on
More information about the dev