[OSM-talk] OSM at home
osm at edavies.nildram.co.uk
Thu Nov 2 13:02:30 GMT 2006
David Groom wrote:
> I just want to give a quick reminder of the work done
> by Oliver on OSM at home http://almien.co.uk/OSM/Places/
Cute site and a good hint to remember to put in a node to name
a village or town - it's too easy to focus on roads, pubs, post
boxes, etc, and forget the big picture.
It does rather mess up regarding character sets though. It
looks to me like it's taking UTF-8 input text but interpreting
the bytes as if they were Windows-1252. The output is then
encoded as US-ASCII with HTML character references but allowing
the Windows-1252 byte values which are not in ISO-8895-1
through untouched - or something.
See, for example, "BÃ.Â¼chenbach"
with the "." standing for the byte 0x83 (which in Windows-1252
is the Florin currency symbol (U+0192 LATIN SMALL LETTER F WITH
HOOK), a character which is not actually present in ISO-8859-1 (*)
which is the nominal encoding of the page). I assume that the
two sequences "Ã." and "Â¼" are supposed to
each be two byte UTF-8 characters.
The sooner that everybody just uses Unicode (and keeps track of
which encoding of it they're using) the better, so we can all
forget this sort of nonsense.
(*) Yes, alright; 0x83 is in ISO-8859-1 but it's a rather obscure
control code, not anything that's likely to crop up in the name
of a town or even a very small village.
More information about the talk