[OSM-dev] planet.osm - fix

Michael Strecke MStrecke at gmx.de
Tue Aug 15 15:43:34 BST 2006


David Shaldon wrote:
>> I vote for ASCII, and anything that cannot be represented in ASCII being
>> represented by entities in the &#..; form.

Jonas Svensson wrote:
> when we expect UTF-8. Even if html-escaped or not.
> I suggest we look into that instead of arguing if we want HTML-escaped
> entities or not.

According to the docs provided by David, the "HTML-escaped" characters
&#..; represent UTF-16/32 characters (and only UTF-16/32). Adding just
another encoding scheme to a simple street name :-)

http://www.w3.org/TR/2004/REC-xml-20040204/#sec-references
http://de.wikipedia.org/wiki/ISO_10646
http://de.wikipedia.org/wiki/UTF-16

>> Just people who don't understand the XML
>> specification trying to write XML parsers.

I haven't tested if the built-in Python XML parser understands UCS... I
will do that shortly.

> No matter what, the server should not export things like
> "Straße" or "Genter Straße" since these are latin-1 or
> broken utf-8 when we expect UTF-8. 

At some point a decision has to be made which character encoding has to
be used, or has it been decided already?


Michael




More information about the dev mailing list