[Imports] Review of proposed spanish castrade import (Avila.osm)

Paul Norman penorman at mac.com
Sun Mar 25 06:18:35 UTC 2012


As this is such a large file, I will first be listing some general concerns,
then going through specific concerns for each object type. I will be
referencing specific examples by either lat,lon pairs or by IDs from the
.osm file.

Just for reference the file has approximately 60k relations, 148k ways and
406k nodes. The database currently has about 80 relations, 2.4k ways and 31k
nodes.

Wikipedia indicates that the population of Avila is 58k.

As this one file would increase the number of relations in the database by
5% (and the number of multipolygons by 10%), have you spoken with the
sysadmins to see what the impact would be on the OSM servers?
(http://wiki.openstreetmap.org/wiki/Import/Guidelines#Keep_server_resources_
in_mind)

Relations take longer to process by most data consumers than ways or nodes. 

If one town of 58k were to increase the number of multipolygons by 10%, I
could see the import easily doubling the total number. What impact would
this have on processing times for the various tools?

Overlap:

Frequently one multipolygon is overlapping another. An example of this is
found at (40.6615, -4.6895). Here one MP (-1289355) covers the entire lot
with landuse=farmyard while a smaller one (-1269673) covers the non-building
part of the lot, overlapping with the first. The top four are tagged
indentically.

Breaking lots into too small areas:

http://maps.paulnorman.ca/imports/review/avila1.png illustrates one example
of this problem. Selected is one MP for a lot with 5 MPs inside with
landuse=residential overlapping with the big one for the lot. Why so many?

The small block of 5 houses below has 31 landuse=residential multipolygons.
It should only have one for the entire block.

Multipolygons with no styles on them

Object Types:

landuse=farmyard seems to be applied to areas that are not farmyards. For
example, the buildings at (40.6605, -4.6921) are tagged as being farmyard,
but they're in the middle of a city block with no farms in the area.

power=tower is being used for what appear to be power=pole

When I proposed my surrey address import to the list the consensus was that
addr:country was not needed with working boundary relations.

catastro:surface, catastro:surface:built, catastro:surface:overground: What
are these tags indicating?

natural=tree: A lot of these nodes are located in the middle of roads or in
other places which would be absurd like the middle of a sports field. The
imagery agrees with the other data and indicates the trees are wrong.

addr:*=*: There are already address nodes in the city. How do you propose to
conflate these with the data you want to import?

Conclusions:

There are many significant issues with this data that render it only
suitable as a background layer for tracing from and not a direct import. The
issues of excessively splitting areas up that are present with most property
lot data sources are even worse here, using an absurd number of objects to
represent what a mapping tracing would only use 1-2 for.

Looking at my data increases the conviction that the French and American
communities have been right in rejecting parcel lot data. It is just not
suited for OSM.

I routinely work with large .osm files in JOSM and work with gigabyte-sized
.osm files. The slowness I found when working with this file surprised me.
Although not worse than the gigabyte files I sometimes open, it was
surprisingly slow.




More information about the Imports mailing list