[OSM-dev] planet.osm - fix

David Sheldon dave-osm at earth.li
Tue Aug 15 15:24:41 BST 2006


On Tue, Aug 15, 2006 at 04:12:37PM +0200, Jonas Svensson wrote:
> On Tue, 15 Aug 2006, David Sheldon wrote:
> 
> > I vote for ASCII, and anything that cannot be represented in ASCII being
> > represented by entities in the &#..; form. This way there should be no
> > transport problems. Just people who don't understand the XML
> > specification trying to write XML parsers.
> 
> No matter what, the server should not export things like
> "Straße" or "Genter Straße" since these are latin-1 or
> broken utf-8 when we expect UTF-8. Even if html-escaped or not.
> I suggest we look into that instead of arguing if we want HTML-escaped
> entities or not.

ARGHGHGHHGHGHGH

ARE YOU NOT READING A THING I SAY.

Straße IS ASCII. 

S is in ASCII
t is in ASCII
r is in ASCII
a is in ASCII
& is in ASCII
# is in ASCII
x is in ASCII
D is in ASCII
F is in ASCII
; is in ASCII
e is in ASCII


If you are reading an XML document, your XML processor MUST interpret
ß as unicode codepoint DF (a double S thingy). 

This is part of XML. I don't want anyone to use the word HTML anywhere.
These are XML entities, the way that you represent in XML characters
that are not part of the character set that you are using for your
document. 

Straße is the most portable way of representing what we want in
XML. It is the same in every character set, even 7-bit ascii.

It is correct for the server to export Straße. This is exactly what
we want it to export.

David
-- 
"Not *the* Jane Harrington? Jane 'bury me in a Y-shaped coffin' Harrington?"
                        -- Edmund Blackadder.




More information about the dev mailing list