[OSM-dev] UTF-8 Failure...

Andrew McCarthy me at andrewmccarthy.ie
Mon Aug 18 09:34:33 BST 2008


On Mon, Aug 18, 2008 at 10:09:01AM +0200, spaetz wrote:
> As my utf-8 knowledge is next to inexistent, I would appreciate if people could have a look whether this is really a case of an UTF-8 error, or whether our UTF-8 checker is wrong.
> The UTF-8 checker in the t at h client aborts when rendering
>     ./tilesGen.pl --Layers=caption xy 35 21 6
> with an UTF-8 error in line 262. The node is question is:
>     <tag k='loc_name' v='Banska Stiavnica,Banska Stiavnica a Banska Bela,Banskà Ãtiavnica,Banskà Ãtiavnica a Banskà Bel##.'/>

The problem is in the last word in this line. Bel is okay, but the next
two bytes, C3 2E, aren't valid UTF-8. If the first byte is in the range
C0–DF, the second byte must be in the range 80–BF.

If you have it on your system, the man page for UTF-8(7) is a really
good reference for this. Very short, and surprisingly readable.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20080818/090d50bb/attachment.pgp>

More information about the dev mailing list