[OSM-dev] UTF8 problem with last night's daily .osc

David Earl david at frankieandshadow.com
Fri Aug 29 10:46:36 BST 2008


On 29/08/2008 09:03, Frederik Ramm wrote:
> Hi,
> 
>>     seems like we're having UTF8 trouble (again):
>>
>> % zcat 20080828-20080829.osc.gz  | xmlstarlet val -
>>
>> -:1330038: parser error : Input is not proper UTF-8, indicate encoding !
>> Bytes: 0xC3 0x22 0x2F 0x3E
>> d ein spezielles Kinderland. Besonderer Schwerpunkt am Mittersteig sind 
>> Antiquit
> 
> Closer inspection reveals that this is a tag value that has been 
> truncated at character #255, which happens to be in the MIDST of an 
> UTF-8 sequence. Ouch! Who truncates tags to 255 characters?

I bet they are VARCHARs in the database, and the max length for these is 
255.

David





More information about the dev mailing list