[Imports] Update: HOT Somalia import

Schuyler Erle schuyler at nocat.net
Wed Aug 17 08:32:28 UTC 2011

Today, HOT received permission from UN OCHA to use its populated places dataset, so we have removed the populated places, localities, and admin areas from the GNS file we were going to import. Combined with the proximity filtering described in my previous email to the list, this cut down the set of PoIs to 4,790 man-made and physical features.

Based on feedback from the OSM IRC channel, I removed nearly all of the GNS-specific tags from the imports, leaving only gns:ufi (which is the unique identifier within GNS), gns:modify_date, and gns:type (which is the DSG feature type code from GNS, and I left it in as is, in case anyone wants to go through later and fix/amend my tag mapping choices). I will document all of this on the OSM wiki before I call this project done.

I uploaded the GNS subset as Changeset #9044787: http://www.openstreetmap.org/browse/changeset/9044787

I am now working on generating the import for the OCHA populated places data for Somalia. An example record looks like this:

<node id="-3" visible="true" lat="9.581149" lon="44.060915">
  <tag k="source" v="GTZ - 2002" />
  <tag k="pcode" v="NC-3810-L13-015" />
  <tag k="place" v="suburb" />
  <tag k="name" v="Cayncal" />

<node id="-4" visible="true" lat="2.664540" lon="45.617550">
  <tag k="source" v="UNDP - 1997" />
  <tag k="pcode" v="NA-3807-J14-001" />
  <tag k="place" v="village" />
  <tag k="name" v="Dalaash" />

The source string comes directly from the OCHA dataset. The "pcode" is a well-known UN convention for uniquely identifying places. About 25% of places have an alt_name tag as well. I will document all of this on the wiki.

The deduplication is the tricky part. I am checking the OCHA populated places against existing OSM points, using the same criteria referred to before, and *additionally* excluding any populated places that have 85% trigram similarity to an OSM area within 5km. I'm afraid that some may still slip through. I'll do some more work and report on my findings in the morning.

If anyone has any commentary about the second phase of this import, please let me know. Thanks!


More information about the Imports mailing list