[OSM-talk] address interpolation

Fri Oct 2 10:14:19 BST 2009

> The problem is as follows:
>
> You see an interpolation "25a to 25c". How do you know that this means
> "25a, 25b, 25c"? You know by removing the number and then starting with
> the "a" go through code points adding one until you reach "c". Easy.
> This will work for all alphabets where that are layed out in alphabetical
> order in Unicode, and they probably all are. (but thats an assumption on
> my part :-)

Ouch. Unicode order has no meaning in the real world, and only really 
works for English (and not even then properly for subtle cases, like 
ligatures, not that these would ever be used in these kind of addresses).

You need to know the lexical ordering, which means you need to know the 
language. Sometimes you can guess from the character, and two characters 
make it easier than one, but the problem doesn't go away with two - the 
"null" variant isn't central to this problem.

There's also a cultural assumption about how you might do this in other 
countries. I've no idea how Chinese addresses are formulated normally - 
whether they even use digits, and if those digits are the arabic 
numerals - let alone what these exceptional cases might be. But IF you 
know it is Chinese and IF the scheme fits, with digits + Chinese 
Character, then the null case still works (Chinese characters still have 
a lexical ordering, I believe it has to do with the number of strokes, 
but any relationship to Unicode order is purely coincidental)

So I'm coming round to the view that alphabetic should explicitly only 
mean only
   n nA nB ... nZ
where you can start and end at any point in the sequence, and not even 
try to deal with other characters from other alphabets (not even other 
latin ones). Any other sequence from other cultures needs its own 
interpolation style or additional qualifying tag to identify it, just as 
we'd tag an email with the encoding.

David