[OSM-dev] Re: Re: [OSM-talk] planet.dump

Erik Johansson erjohan at gmail.com
Thu Aug 3 09:05:34 BST 2006


On 8/3/06, Jonas Svensson <jonass at lysator.liu.se> wrote:
> On Wed, 2 Aug 2006, Raphael Jacquot wrote:
>
> > Jonas Svensson wrote:
> > > Has there been any discussion on how to handle international names and
> > > character encodings? Also things like writing direction (left-to-right,
> > > right-to-left and others)?
> > >
> > > I notice that the MapFeatures-page mentions International name, local name
> > > and regional namn so there must have been some thinking on this subject.
> >
> >
> > for starters, the whole thing should be UTF-8
>
> Yes, wouldn't it be good to change the API to require strings (like names)
> to be UTF-8 when sent to the server/database? If possible also change the
> server to validate strings to be valid UTF-8.

Valid UTF-8 isn't enough. E.g. some time ago someone[1] complained
about the encoding in planet.osm, they gave the example of
"Älvsjövägen" and said it looked horrible in the dump, it was UTF-8
encoded with "&"-entities.

That was perfectly valid UTF-8, but perhaps not the thing you want to
have in the DB. And I don't see how you can make sure applications
handle that correctly, because someone will always write a small
one-line script and make a mess.

So
1. only pass valid UTF-8 chars
2. remove all &number;-entities
3. if string is different fail insertion.



[1]http://lists.openstreetmap.org/pipermail/talk/2006-April/003203.html
-- 
/Erik




More information about the dev mailing list