[Talk-se] Ortnamnsimport från Lantmäteriets GSD-Terrängkartan

Thu Jan 16 17:18:51 UTC 2020

Hej!
Jag har extraherat de ortnamn som nu saknas på Sveriges OSM-karta ifrån Lantmäteriets öppna data, daterade januari 2020. Det finns ungefär 95 tusen nya noder med namn och "place=*"-etiketter vilka jag så småningom hoppas ladda upp till OSM.

En såpass stor mängd nya data kräver att man följer vissa procedurer och förbereder vissa dokument. Jag hoppas att få er feedback och eventuell hjälp med valideringen, uppladdningen och med andra eventuella uppdrag.

Här finns importplan för projektet [1] på OSM-wikin. Den beskriver informationens härkomst, licens och format. Sedan beskriver jag hur de ursprungliga filerna bearbetas, hur nya punkter filtreras mot den befintliga OSM-databasen, hur ortnamn rensas och jämföras, vilka skript och program används vid alla steg osv. Till sist uppger jag vilka problem kvarstår att lösa under manuell bearbetning.

Importplanens bitar med viktigaste sektioner bifogar jag längst ner. Här är också en mindre bit av hela datasetet om du vill se hur det ska se ut: [2] [3]. Andra länkar till Lantmäteriets dokumentation, mina utvecklade skript, samtliga OSM-filer, kalkylblad osv finns på importplanens sida.

Tack!

[1]  https://wiki.openstreetmap.org/wiki/Import/Catalogue/Lantm%C3%A4teriet_GSD-Terr%C3%A4ngkartans_ortnamnsimport
[2]  https://drive.google.com/open?id=1np1TEDlEBWx1kt-u7A4Z_ZpkMOwOp80l
[3]  https://drive.google.com/open?id=1pERx-U4rdOjhXmePoSxcbKRZsr-preh8

Importplanens utdrag följer.

===Goal===
To improve OSM completeness for toponymical dataset on territory Sweden using 
an official map supplied by Swedish mapping, cadastral and land registration authority.
This import considers OSM data representable as nodes tagged with usual
key/value pairs: "place=city", "place=town", "place=village", "place=hamlet",
"place=isolated_dwelling", and "place=locality". However, it is not planned
(but not fully excluded either) to add/modify any nodes with "city" and "town"
values. They are expected to be already fully mapped.

==== Data processing diagram ====
See the diagram below. The conflation stage is described later in more details.
+-------------------+        +------------------+
|                   |        |                  |
|Lantmäteriet's SHP |        |Geofabrik country |
|files              |        |extract           |
|                   |        |                  |
+---------+---------+        +--------+---------+
          |                           |
          |ogr2osm                    |osmconvert
          |                           |osmfilter
          v                           v
 +--------+---------+         +-------+---------+
 |                  |         |                 |
 |OSM file with     |         |OSM fiele with   |
 |settlements       |         |settlements      |
 |                  |         |                 |
 +---------+--------+         +-------+---------+
           |                          |
           |                          |
           |     conflate-places.py   |
           +<--------------------------
           v
  +--------+--------+
  |                 |
  |OSM file with    |
  |only ready nodes |
  |                 |
  +--------+--------+
           |
           | Manual corrections
           |
           v
    Upload to JOSM

The employed algorithm operates on a set of old nodes marked with "place=*"
(from the OSM-extract, around 68 000 nodes for the country) and new nodes
(from SHP-extract). It produces ready nodes — a strict subset of new nodes.
No old nodes are modified in any way during the process. This means that existing
data has absolute priority, even in cases it is likely of lower quality than
new data.
The sequence of steps is as following.
1. Create a spatial index structure with old nodes to have fast spatial lookup.
2. For all new nodes validation/correction of the "name" tag is performed.
3. For each new node, find old nodes close enough to it to be candidate for duplicates.
4. For each candidate node, compare its name against the current new node name.
   Comparison is fuzzy to allow for some text variation typical for names.
   Alternative old names are also checked if present.
5. If a name match is found, the current new node is marked as "duplicate" and
   is excluded from further analysis and results.
6. An OSM file with ready data is generated.
7. The OSM file is optionally split into smaller tiles to ease and speed up 
   visual validation.

===Expected issues and their risk assessment===

So far, the most problematic issues seems to be "A duplicate of
existing node is added" and "A new node is added with incorrect position".
It is expected that to to discover and fix such problems would require most of
required manual editing.

Med vänliga hälsningar,
Grigory Rechistov
With best regards,
Grigory Rechistov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk-se/attachments/20200116/57a94cdd/attachment.htm>