[OSM-dev] UTF8 problem with last night's daily .osc

Frederik Ramm frederik at remote.org
Fri Aug 29 09:03:20 BST 2008


>     seems like we're having UTF8 trouble (again):
> % zcat 20080828-20080829.osc.gz  | xmlstarlet val -
> -:1330038: parser error : Input is not proper UTF-8, indicate encoding !
> Bytes: 0xC3 0x22 0x2F 0x3E
> d ein spezielles Kinderland. Besonderer Schwerpunkt am Mittersteig sind 
> Antiquit

Closer inspection reveals that this is a tag value that has been 
truncated at character #255, which happens to be in the MIDST of an 
UTF-8 sequence. Ouch! Who truncates tags to 255 characters?


More information about the dev mailing list