[OSM-dev] UTF8 problem with last night's daily .osc
david at frankieandshadow.com
Fri Aug 29 10:46:36 BST 2008
On 29/08/2008 09:03, Frederik Ramm wrote:
>> seems like we're having UTF8 trouble (again):
>> % zcat 20080828-20080829.osc.gz | xmlstarlet val -
>> -:1330038: parser error : Input is not proper UTF-8, indicate encoding !
>> Bytes: 0xC3 0x22 0x2F 0x3E
>> d ein spezielles Kinderland. Besonderer Schwerpunkt am Mittersteig sind
> Closer inspection reveals that this is a tag value that has been
> truncated at character #255, which happens to be in the MIDST of an
> UTF-8 sequence. Ouch! Who truncates tags to 255 characters?
I bet they are VARCHARs in the database, and the max length for these is
More information about the dev