[Imports] Spanish Cadastre ELEMTEX

Thu Jun 7 08:34:10 UTC 2012

2012/6/7 Paul Norman <penorman at mac.com>:
>> From: Javier Sánchez
>> Subject: [Imports] Spanish Cadastre ELEMTEX
>>
>> The ELEMTEX layer contains text labels about unpopulated places, many
>> small populated places and points of interest (like police stations,
>> post offices, hospitals, schools, etc), but they are not categorized.
>> These data are extracted with an option of Cat2Osm. The result consist
>> of nodes with the tags name=*, source=cadastre and source:date=*. The
>> job basically consists in manually check nodes, assign tags most
>> suitable to describe the elements based on the name and local knowledge,
>> drop which of them that can not be classified, correct other errors like
>> spelling and conflate with existing OSM data.
>>
>> As example two files are attached. The first is the output generated by
>> the program Cat2Osm for a municipality [3]. The second contains the data
>> reviewed manually [4].
>
> I have a few comments, some on the raw data, some on the edited data. This
> is much improved and dealing with a smaller subset of data makes it much
> easier to review.
>
> - Some nodes have only name, source and source:date tags while others have
> those and place=locality

Yes. Most of the nodes correspond to unpopulated places and have the
tag place=locality assigned tentatively. After a manual revision, some
of them could be assigned to populated places, ussually small ones
like issolated dwelling and hamlet. If the node don't have
place=locality, most probably it is a POI.

> - Some have absurd names (e.g. name=-I+I and name=P-1, P-2, etc). These
> could be dealt with manually but it would be worth seeing how common they
> are and dealing with them in the conversion

Yes. They correspond to parcel numbers (for example in a industrial
estate). They are prone to deletion. Some kind of automatic proccess
could be added in the program to detect them and filter. I will
suggest this to the programers.

> - Some of the names could be translated to tagging, e.g. name=GASOLINERA
> could be turned into amenity=fuel. Again, this would depend how common they
> are and how consistent they are.

Also yes. Again the program could take some decisions on this.

> - All I noticed with the processed file is that were some ways with only
> source, source:date and name tags.

"Mea culpa" I'm embarrassed, this is a mistake of mine. There is one
way corresponding to a school and one node corresponding to a
cementery  without phisical tags.

> CanVec has convinced me that going this method with imports requires some
> sort of QA plan or one importer will not check any of their work and cause
> duplicates, disconnected data, bad data, etc.

And it is demostrated with my previous mistakes.

> The problem is not that there aren't tools for checking your own work, the
> problem is that some people will not use them and cause significant damage.
> I can't suggest a solution for this, and I think it is not as significant
> for this particular data set, but it is definitely an issue when you start
> getting into others like roads and landuse.

Please make your suggestion. I propose publishing the processed osm
files and ask in the Spanish list (or here by the way) for a revision
by a second peer prior to upload.

> If you find a good solution for this problem, please document it and let
> everyone know as it is a big problem with imports done by multiple people
> who may have varying standards.

I will correct this points. Thank you very much for all.

Javier