[OSM-dev] UTF8 problem with last night's daily .osc
frederik at remote.org
Fri Aug 29 09:03:20 BST 2008
> seems like we're having UTF8 trouble (again):
> % zcat 20080828-20080829.osc.gz | xmlstarlet val -
> -:1330038: parser error : Input is not proper UTF-8, indicate encoding !
> Bytes: 0xC3 0x22 0x2F 0x3E
> d ein spezielles Kinderland. Besonderer Schwerpunkt am Mittersteig sind
Closer inspection reveals that this is a tag value that has been
truncated at character #255, which happens to be in the MIDST of an
UTF-8 sequence. Ouch! Who truncates tags to 255 characters?
More information about the dev