[OSM-dev] What country is something in?

Roland Olbricht roland.olbricht at gmx.de
Tue Nov 11 10:57:17 GMT 2008


> Hi recently I've uploaded all the borders for Italy as written here:
> http://wiki.openstreetmap.org/index.php/Italian_Borders

Thank you, that's great. I've just checked the data and there's not even a 
single bug in the data.

> now I would like to do the opposite, so being able to obtain a list of
> vertexes that define a border between the various Italian regions
> (20), provinces (104) and municipalities (8100).
>
> I did try to hack your RUNME file a little bit, but I was not
> successful in getting the data I needed.
>
> I would like to get a file, one for each of the above entities, with
> data formatted for osmosis use. So I could make available regional and
> provincial data/planets for later debugging and analysis.
>
> Is there a way with you program to do this?

Not until yesterday. But the presence of the data and the request are a good 
motivation. It took a day to tune the given software to something useful for 
this task. I've put the newest version on
http://wmaz.math.uni-wuppertal.de/olbricht/osm/osm-boundaries-source.tgz

I have preferred the timelineness of this release over a good documentation. 
So I hope, for the first try, the following instructions might help:


For the impatient:
1) edit the file "country-patches.not_osm" and add the pivot nodes
2) run "./RUNME_subnational ../osm/non-us.osm.bz2 4 region 36.6 47.2 6.6 18.6"
   resp. "./RUNME_subnational ../osm/non-us.osm.bz2 6 province 36.6 47.2 6.6 
18.6"
3) if everything is satisfactory, please upload the pivot nodes

ad 1)
Please add to the file "country-patches.not_osm" pivot nodes for each region 
and each province. There is an example for the "Aosta Valley" in the file. 
The node's id "addon-0" is just a dummy but necessary numbering and can be 
retired after step 3. The values for lat(titude) and lon(gitude) are the 
essential data, so they must be specified for each region to lie somewhere in 
the respective region. The node must be tagged as "place" with value "region" 
resp. "province" or whatever you have submitted as third parameter to 
RUNME_subnational because this is used by report-results to recognize the 
node as relevant. It should contain a tag with key "name" because its value 
is used by report-results as the target filename.

ad 2)
This will work similar to the RUNME script with some small changes
- the second parameter controls which ways to use as boundary: "2" will take 
only those ways tagged as "admin-level" "2", the value "4" will take ways 
tagged as "admin-level" "2" or "admin-level" "4" and so on for even numbers 
up to 10.
- the third parameter controls which kind of pivot nodes are taken into 
account: It specifies the value of the tag with key "place". For nations, 
this is usually "country".
- the subsequent parameters specify a bounding box with southern and northern 
latitude, then western and eastern longitude. The values should be a 
reasonable approximation for Italy. You may omit it but you might end up with 
way more data than expected and you might get in trouble if two areas in 
different countries have the same name.

ad 3)
Please upload the pivot nodes you have written for the regions or provinces to 
OSM. It's a good deal of work to write them and they would be useful for 
other mappers as well. Also, they get ordinary id-values that way.


Background concerning the data:

The software can determine the components delimited by the boundary ways 
automatically, but it can't guess the names of the areas. A first glance at 
tagwatch has shown that there are a couple of approaches in the current OSM 
database to represent this data:
- A node tagged with "place" something and coordinates somewhere in the 
respective area. This data is almost consistently present for nations so it 
formed the base for this software. It would be useful to extend this 
representation also to regions ans provinces but it had not taken place yet.
- A tag whatever:left|right and the name of whatever. This is handy for a 
renderer to write this name immediately on the border, but it's not useful 
for most other types of software: this entries are often not present, or 
broken by typos, wrong entries or name variants. So a software using this 
data would have to do a lot of guesswork which variant is right. Thus, this 
software does not support this representation.
- The most recent approach stores the tags not in a pivot node but rather in a 
relation that contains as members the ways that form the boundary. From the 
point of view that a database is execellent on performing joints, this 
solution provides all the data of the two above in almost the same speed. And 
it is the only solution that allows to store the information of exclaves and 
enclaves in a natural way. So I personally would prefer to do some automatic 
conversion in the future to that representation. But this has been a 
controversal point of discussion. Anyway, the data in the nodes could be 
easily converted, so no information is lost if you store the data now in 
pivot nodes.

Writing the pivot nodes by hand for the regions should be quick enough to do 
it, for the provinces this work will already become tedious but remain a 
matter of a most one evening. For the municipalities, this probably is too 
much work. So if you already have the municipalities in another data 
representation, feel free to ask me or whomever for an automatic conversion 
script.

Tagwatch has also shown that the tag "admin_level" is rarely used with odd 
numbers as values. Thus, those ways are ignored here. You can change it in 
the script at lines 12-19 if there is an intended usage of these ways.

Cheers,
Roland




More information about the dev mailing list