[OSM-dev] UTF-8 problems in informationfreeway?

Sat Dec 22 07:55:05 GMT 2007

Brett Henderson wrote:
> Stefan Baebler wrote:
>   
>> On Dec 21, 2007 7:41 AM, Brett Henderson <brett at bretth.com> wrote:
>>   
>>     
>>> I'm a dufus, I may have found the problem.  I didn't have the production
>>> encoding hack enabled on the hourly diffs, I've enabled it now.
>>> Presumably this means that most non ascii characters were being mangled.
>>>     
>>>       
>> s*it happens :)
>>
>> Perhaps writing some of the commandline parameters (all sans
>> passwords) into the header of xml might be a good idea to know how
>> data was obtained. Of course this would only reveal last osmosis
>> operation on the given dataset, but better this than no info at all.
>> When opening such file as input stream with osmosis this info could be
>> shown (either by default or in verbose mode).
>>   
>>     
> Perhaps, but it's not very simple.  I'm not sure how you'd pick the 
> relevant options to include without just dumping the entire command line 
> into the file.  That leads to problems figuring out which parts of the 
> command line should be masked out and getting access to the orginal 
> command line in the first place.  It could be messy.
>   
IMHO just passwords should be masked, otherwise whole commandline can be 
put in there.
It would help answering the most common questions "what's in this 
file?", "how was the data filtered?", "how do i run osmosis to get such 
file?"
> However my main issue is that I'm very hesitant to add kludges to 
> support features of limited value.  Given the generic nature of osmosis 
> it is hard to add metadata that is useful.  A similar one is that 
> osmosis isn't currently supporting the bounds xml element because it is 
> difficult to do so in a meaningful way.
>   
There are also external files (polygons, DB config files...) that are 
not included in the command line (the name is mentioned to give some 
idea though).
To go further whole process could be somehow preserved by nesting input 
streams into operations, then wrapping that into output stream, 
preserving the original ... :)

<out cmd="--write-xml file=slovenia-2007-12-20.osm.bz2>
<applychange>
<in cmd="--read-xml file=slovenia-2007-12-19.osm.bz2">
<out cmd="--write-xml file=slovenia-2007-12-19.osm.bz2">
<bounds="--bounds ......">
<out cmd="--write-xml file=planet-2007-12-19.osm.bz2">
<in cmd="--read-mysql host=db.osm.org"/>
</out>
</bounds>
</out>
</in>
<in cmd="--read-xml-change file=planet-daily-20071219-20071220.osc.bz2">
</applychange>
</out>
... :)
>>   
>>     
>>> Let me know if you see it occurring on any new hourly diffs.  Daily
>>> diffs already had the prod encoding hack enabled so if they contain UTF8
>>> issues please let me know.    
>>>       
>> I will check my problematic nodes tonight and report if there are any problems.
>>     
Latest change 
(http://planet.openstreetmap.org/hourly/2007122205-2007122206.osc.gz ) 
contains my problematic node, with extra utf characters in note tag, 
looking perfect:
<node id="29161753" timestamp="2007-12-22T05:59:49Z" user="StefanB" 
lat="46.1356895" lon="14.7445634">
<tag k="created_by" v="JOSM"/>
<tag k="name" v="Moravče"/>
<tag k="is_in" v="Slovenia, Europe"/>
<tag k="place" v="town"/>
<tag k="note" v="Testing 34 random UTF-8 
characters:ČčŽžŠšĐđĆć€ÄäËëÖöÜüŁłßÇç÷×¤§ÉéÁáÂâ"/>
</node>

however, osmxapi doesn't respond to
http://osmxapi.hypercube.telascience.org/api/0.5/node%5bplace=town%5d%5bbbox=14.5,46.1,14.8,46.2%5d
Oops, did it kill it or it died of some other reason?

Stefan