[Imports] Adding pcodes to villages in Liberia, Sierra Leone and Guinea

Andrew Buck andrew.r.buck at gmail.com
Wed Sep 24 17:54:12 UTC 2014


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello everyone.  As you are probably aware HOT has been working to map
the areas affected by Ebola in west Africa and to help humanitarian
organizations better use OSM data in their efforts there.  Because OSM
has the best dataset of settlements (towns, villages, etc) in the area
several prominent groups have chosen to standardize on OSM being their
official source for place names and locations.

Due to the issue that many place names in Africa (and elsewhere) have
many different spellings (due to the local languages not using latin
alphabets) it is common for these organizations to establish a
standardized set of place codes (pcodes for short) which are used to
refer to places in datasets and communications.  The pcodes work
similarly to zip codes in the US or postal codes more generally
elsewhere in the world.  The list of pcodes is generally held by the
UN and is used by almost every large humanitarian organization to
communicate place information as the numerical codes prevent confusion
due to multiple places with the same name, etc (just like postal codes
are used).

For the three countries in question, no pcodes had yet been generated
so it was decided that the way they would be created was that the OSM
dataset would be exported, a unique pcode generated for each village
in the dataset, and then that would be adopted by these organizations
as the official pcode for that location.  This was done over the past
week and we now have the results in a csv file with columns containing
the OSM id of the place, the version of the place at export, lat/lon,
and then all of the associated name tags and finally a column for
pcode which was filled in by the people generating them.

Since these newly generated pcodes are now the official pcodes for
these places, we plan to import them back into OSM onto each place in
the pcode=* tag.  This allows the dataset to be much more easily used
in the future, as well as allowing us to re-export and generate pcodes
for newly added villages that do not already have them (since OSM is
always growing).  The exact format of pcodes varies from country to
country due to the specifics of the countries involved.  In some
countries the pcode is chosen to be identical to the already existing
postal codes.  For these specific countries since there was no
existing system the codes are formatted using the three letter country
code, a 2 digit number indicating the significance of the place, and
then a running 5 digit number counting up from 1 to identify the
specific place name, so for example the code for the capital of
Liberia, Monrovia, has the code LBR0400001.

Since the codes are newly generated and used OSM as the source of
their creation, the import would be very easy to do, and there is zero
risk of the data being "wrong" (it was generated exactly this way, so
it is by definition correct at this point).  Also, there is no issue
of licensing to be concerned about as the data in the file is OSM data
to begin with, with the exception of the pcode, and we have explicit
permission to use that as that was exactly the plan to begin with.

Given that we have both the id/version pair in the dataset, as well as
the lat/lon there are a couple ways the re-import of the pcodes could
be accomplished.  The easiest would be to use josm and the conflation
plugin, with a very small match distance (like 5 meters or something)
and then just manually conflate the handful of nodes that may have
moved in the last week.  The other possibility (which is more robust)
is to write a simple script that adds the tags to the objects based on
the id/version pair and then generates a list that must be manually
processed for any objects which have a newer version than that at the
time of export.  I think either method will work, but I do think the
script is a bit more robust, and we already have someone interested in
working on the script to make it happen.

Let me know if you forsee any potential problems with either of these
two methods and how you think they could be addressed.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iQIcBAEBAgAGBQJUIwU9AAoJEK7RwIfxHSXbauUP/0kB8puz0iO/szIJMwjPZU8S
8+H1nJnOd7pMmBprWA0od8UDvPLfrjkxboR5OBVzI2oxGRR2SgKFCHUtnrU6NA8B
snpbdJiVHgPPUJHs4YRUjlhQtGoonaQ+ETX1Tmf1ZoLSo7QDPKfI6zqUu/HNZRId
N2toTpYTKodW27wOGg77CVS4l/xYErIF47hO1nYD9H30Cg/yWftyv0lMUps/M8rz
ex8lEZ+60Gzx9TGlEerSH5lIOE/9K7ZfrkNMJNyvBoOMBp8oaeAcC+2l+qDqqO+o
yAJcz4AUgwhfVcr9E4/ZE7vBbnJIORzcKwpwQyqkYuyd5Zm3KD4E51C97EACbIo8
Q/kOaPHE2LbW3uBLqkCgI7XC2t93DaLP8ejFSScwLbA/Mq5uBMj5h7FndijlpQfL
LCbSziv/T7B/p8q60haMWxeL/hQfwsiSJRYmLh05qhCLPQUNNqdX8bWjEwZzCfJC
GFKVtSQGfWipcp93FsnkEhW0o5jMHNWVeSI3VcrQ7AU/fI27jW/Ivag7RVI4iWoZ
Gkk5a5B1/GLYhFYutAFLcdFhQ6632d5cI/iN48VKanQ8hm9FUZJOnQgQvRU7p3Pc
zdsNjXUHJkuJzmTN02L8kjXIMjUiZJDxCkQLTNY30QW81FXbc6KDkI0HnnekwV4+
m8Nv59MV1PrzQXuG/oNJ
=Qet7
-----END PGP SIGNATURE-----



More information about the Imports mailing list