[OSM-dev] UTF-8 problems in informationfreeway?

Brett Henderson brett at bretth.com
Fri Dec 21 06:41:26 GMT 2007


I'm a dufus, I may have found the problem.  I didn't have the production 
encoding hack enabled on the hourly diffs, I've enabled it now.  
Presumably this means that most non ascii characters were being mangled.

Let me know if you see it occurring on any new hourly diffs.  Daily 
diffs already had the prod encoding hack enabled so if they contain UTF8 
issues please let me know.

It is easy for me to re-generate the hourly diffs if necessary, I just 
have to modify the timestamp file and it will go back in time and 
re-generate up to the current time.  If anybody wishes me to do this let 
me know.

Brett Henderson wrote:
> Yes, that's it.  I thought I'd already covered all cases but 
> apparently I was wrong.  My home ADSL is back up again so hopefully 
> I'll get a chance to check it out soon.
>
> If you see any problems with the current implementation please send a 
> patch.  Without production db access it might be difficult though.
>
> Stefan Baebler wrote:
>> So, the solution is to just provide a patch with more cases for 
>> escaping in
>> http://trac.openstreetmap.org/browser/applications/utils/osmosis/src/com/bretth/osmosis/core/xml/common/ProductionDbDataDecoder.java 
>>
>> http://trac.openstreetmap.org/browser/applications/utils/osmosis/src/com/bretth/osmosis/core/xml/common/ProductionDbDataEncoder.java 
>>
>> and hope they work fine?
>>
>> It would of course be better in a long run to fix the main DB, but I'm
>> not sure what all this brings along. Probably a lot.
>>
>> Stefan
>>
>> On Dec 19, 2007 10:36 PM, Brett Henderson <brett at bretth.com> wrote:
>>  
>>> Hi All,
>>>
>>> I've lost my home ADSL (won't line sync, tried two modems, tried 
>>> different
>>> leads, doesn't seem to be my end) so I'm mostly offline.  As a 
>>> result I'm
>>> unlikely to get onto this issue in the short term.  With Christmas
>>> approaching I'm bracing myself for a long'ish outage.
>>>
>>> If anybody wishes to take a look, the hacked character encoding 
>>> class is
>>> named ProductionDbCharset and has two related classes named
>>> ProductionDbDataEncoder and ProductionDbDataDecoder.
>>>
>>> The classes are instantiated within BaseXmlWriter which is extended 
>>> by the
>>> XmlWriter class for writing osm files and XmlChangeWriter for osc 
>>> files.
>>> The hack works by just passing the doubly encoded data through the 
>>> osmosis
>>> pipeline then fixing it before writing to xml.
>>>
>>> Not sure how easy it will be to fix without access to a doubly encoded
>>> database though.
>>>
>>> Brett
>>>
>>>
>>>
>>> On 12/20/07, Martijn van Oosterhout < kleptog at gmail.com> wrote:
>>>    
>>>>
>>>> On Dec 18, 2007 1:04 PM, Stefan Baebler < stefan.baebler at gmail.com> 
>>>> wrote:
>>>>      
>>>>> I somehow assumed utf8 would be the default choice by now. Also
>>>>> http://wiki.openstreetmap.org/index.php/Database_schema
>>>>> mentions utf8 explicitly for every table individually.
>>>>>
>>>>> Why does main api work nicely then?
>>>>> Why are full planet dumps ok?
>>>>>         
>>>> There's an encoding issue in that what the ruby server thinks it is is
>>>> different from what the datavase encoding actually is. The net result
>>>> is that the data is encoded *twice*. For example (not actual codes,
>>>> just examples):
>>>>
>>>> Original char: character 0xef
>>>> Encoded as: 0xc3 0xaf
>>>> Stored as: 0xc0 0xc3 0xc0 0xbf
>>>>
>>>>      
>>>>> And more importantly:
>>>>> How can same magic be used to get properly utf8 encoded hourly 
>>>>> changes
>>>>>         
>>> (.osc)?
>>>    
>>>> Osmosis is in Java which is smart enough to not let you do stupid
>>>> thing like getting the database connection encoding wrong. It's just a
>>>> question of fixing the de-double-encoding-hack in osmosis. It doesn't
>>>> help that it's a *windows* encoding in the first step.
>>>>
>>>> Have a nice day,
>>>> -- 
>>>> Martijn van Oosterhout <kleptog at gmail.com> http://svana.org/kleptog/
>>>>
>>>> _______________________________________________
>>>>
>>>> dev mailing list
>>>> dev at openstreetmap.org
>>>> http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
>>>>
>>>>       
>>> _______________________________________________
>>> dev mailing list
>>> dev at openstreetmap.org
>>> http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
>>>
>>>
>>>     
>
>





More information about the dev mailing list