[Imports] Municipality of Örebro, Sweden

Karl Wettin karl.wettin at kodapan.se
Thu Oct 30 19:45:23 UTC 2014


Örebro municipality of Sweden release GIS-data as CC0. 

This data can be harvested and post processed to produce a couple of hundred thousand nodes with a couple of class tag values:

name, place:halmet,
addr:city, addr:place, addr:street, addr:housenumber, 
name, addr:city, addr:place, highway=road
name, amenity=school, isced:level,
name, amenity=social_facility, social_facility= assisted_living, social_facility:for,
name, leisure=park
etc.

Output is one osm.xml-file per class. 

https://github.com/OpenStreetMap-Sverige/import-orebro-osm-xml/tree/master/osm.xml <https://github.com/OpenStreetMap-Sverige/import-orebro-osm-xml/tree/master/osm.xml>
https://github.com/OpenStreetMap-Sverige/import-orebro-osm-xml/archive/master.zip <https://github.com/OpenStreetMap-Sverige/import-orebro-osm-xml/archive/master.zip>

Also attempts to find OSM-duplicates in a radius of 500-5000 meters, which seems to work really well but could probably be improved by allowing a bit of Levenshtein distance, whitespace- and \p{Punct} normalization. Not sure how much this would help though, everything looks pretty great when inspecting manually.

Duplicates from source data are written to a common osm.xml (rather than written to their individual class-osm.xml) and the duplicates from OSM are written (with recursed children) to yet another osm.xml-file.



Script:

https://github.com/OpenStreetMap-Sverige/import-orebro-harvester/blob/master/src/main/java/se/kodapan/osm/orebro/Orebro.java <https://github.com/OpenStreetMap-Sverige/import-orebro-harvester/blob/master/src/main/java/se/kodapan/osm/orebro/Orebro.java>

See line 335 and down to see exactly what classes there are and how the duplication detection mechanism works. (And sorry for all the Swedish language comments and names.)



We are now considering the workflow. 

Consensus on #osm.se at irc.oftc.net <mailto:osm.se at irc.oftc.net> is along the way "the data looks great, let's just commit it and then get started working on it as usual". The reaction on #osm has been quite the opposite "make sure any work is in the one single commit of the import account".

I've been considering asking all that can help with manual burdon och checking all points to do it at github, add any new things to OSM in a per user-changes.osm.xml  to avoid inverted identity conflicts and then to a merge before we committ it. If I understand everything correct then that would satisfy the people I spoke with on #osm.

That might be too much to ask of the users. And the data is really clean. We really want to just push it in the way it is and start working with it in the database as normal using a task project. 



		karl
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/imports/attachments/20141030/f4257284/attachment.html>


More information about the Imports mailing list