[OSM-dev] Mixed character encoding in planet.osm - plan for fixing it

raphael Jacquot sxpert at sxpert.org
Wed Nov 8 11:23:27 GMT 2006


Dave wrote:
> 
>> these for instance are good examples :
>>
>>    <node id="100478" lat="58.4189071655273" lon="15.5100708007812" 
>> timestamp="2006-08-19T11:10:07+01:00">
>>      <tag k="name" v="Kärnabrunns gatan" />
>>      <tag k="highway" v="secondary" />
>>    </node>
>>    <node id="100479" lat="58.4187088012695" lon="15.5110473632812" 
>> timestamp="2006-08-19T11:10:16+01:00">
>>      <tag k="name" v="Gamla Ledbergsvägen" />
>>      <tag k="highway" v="secondary" />
>>    </node>
>>    <node id="100480" lat="58.4186553955078" lon="15.5122394561768" 
>> timestamp="2006-08-19T11:10:13+01:00">
>>      <tag k="name" v="Gamla Ledbergsvägen" />
>>      <tag k="highway" v="secondary" />
>>    </node>
>>   
> Err... am I missing something? These are perfectly good UTF-8. I just 
> pulled them from the DB and they're great.
> They seem to be fine in the latest planet dump as well.
> 
> I think the bug maybe your end if you're seeing anything other than the 
> "ä"s ("latin small letter a with diaeresis" (umlaut), UTF-8 C3A4) I do.
> 

yeah, yeah; those are good.
however, later down the file, you have other tags for which the v is not 
valid utf8, as xmllint --noout --stream <> shows...




More information about the dev mailing list