[OSM-dev] planet.osm - fix
Michael Strecke
MStrecke at gmx.de
Tue Aug 15 15:43:34 BST 2006
David Shaldon wrote:
>> I vote for ASCII, and anything that cannot be represented in ASCII being
>> represented by entities in the &#..; form.
Jonas Svensson wrote:
> when we expect UTF-8. Even if html-escaped or not.
> I suggest we look into that instead of arguing if we want HTML-escaped
> entities or not.
According to the docs provided by David, the "HTML-escaped" characters
&#..; represent UTF-16/32 characters (and only UTF-16/32). Adding just
another encoding scheme to a simple street name :-)
http://www.w3.org/TR/2004/REC-xml-20040204/#sec-references
http://de.wikipedia.org/wiki/ISO_10646
http://de.wikipedia.org/wiki/UTF-16
>> Just people who don't understand the XML
>> specification trying to write XML parsers.
I haven't tested if the built-in Python XML parser understands UCS... I
will do that shortly.
> No matter what, the server should not export things like
> "Straße" or "Genter Straße" since these are latin-1 or
> broken utf-8 when we expect UTF-8.
At some point a decision has to be made which character encoding has to
be used, or has it been decided already?
Michael
More information about the dev
mailing list