[OSM-talk] capitols

David Earl david at frankieandshadow.com
Wed Jun 13 17:54:49 BST 2007


Iván Sánchez Ortega wrote:
> El Miércoles, 13 de Junio de 2007, Sebastian Spaeth escribió:
>>> [...] "capital_of=Texas". [...]
>> so what do you do if there happens to be a municipality of texas and a
>> galactical imperium called texas? this is way too ambigious,
> 
> You don't have to go outside the Earth to find such ocurrences. For example, I 
> know about:
> 
> Granada, a province in Spain, Europe
> Granada, a province in Nicagarua, South America
> Granada, a county in Mississippi, USA, North America
> etc etc etc

We had this discussion a while back in the context of is_in:
http://lists.openstreetmap.org/pipermail/talk/2007-May/014125.html (and 
the rest of a long thread).

The good reason to use links by id is integrity, but the downside is 
complexity - you can't just use the tag value on its own, you have to go 
to a database to fetch something useful; more importantly we have to 
substantially upgrade all our tools, while the looser textual linkage 
means only the tools that want to make use of the linkages need be 
changed. The same applies to users: it is easy to type Granada (but 
equally easy to get wrong), but a much more fiddly process to find the 
relevant node, which may not be in the scope of the data you're working 
on at the time.

Simplicity of the data structures was a starting point for osm as Steve 
explained in the blog he circulated this week.

I think that in nearly all cases you can disambiguate names by 
proximity: it will be clear from the lat/lon which Granada is relevant 
inthe above example. A counter case is one that Ben Laenen pointed out 
in that previous thread: there are Limburg provinces either side of the 
Belgian/Dutch border, hard to determine by proximity. But then I think 
it will be hard to discriminate in any context which involves humans 
reading it too, so perhaps the names could be disambiguated in these 
very few cases ("Limburg (NL)"), so when you read it you can see which 
is meant, as well as in data processing.

The language variants problem needn't be too big a deal either: so long 
as Brussels has 'name=Bruxelles; name:en=Brussels' then the thing that 
is trying to navigate the hierarchy will be able to find it in either 
form, so the creator doesn't have to worry about that, merely about 
spelling it correctly.

In summary, the text form can be done now, is easy to create, can be 
found out of one's head not a database search, so is likely to appeal to 
people to participate in the appropriate tagging. If it is a harder 
process to create the things, has to wait for lots of software to be 
written and so on, people won't participate.

Finally, consider how both our wiki and Wikipedia cross reference their 
pages - for essentially the same reasons.

David




More information about the talk mailing list