[OSM-dev] UTF-8 problems in informationfreeway?

Martijn van Oosterhout kleptog at gmail.com
Wed Dec 19 17:02:34 GMT 2007


On Dec 18, 2007 1:04 PM, Stefan Baebler <stefan.baebler at gmail.com> wrote:
> I somehow assumed utf8 would be the default choice by now. Also
> http://wiki.openstreetmap.org/index.php/Database_schema
> mentions utf8 explicitly for every table individually.
>
> Why does main api work nicely then?
> Why are full planet dumps ok?

There's an encoding issue in that what the ruby server thinks it is is
different from what the datavase encoding actually is. The net result
is that the data is encoded *twice*. For example (not actual codes,
just examples):

Original char: character 0xef
Encoded as: 0xc3 0xaf
Stored as: 0xc0 0xc3 0xc0 0xbf

> And more importantly:
> How can same magic be used to get properly utf8 encoded hourly changes (.osc)?

Osmosis is in Java which is smart enough to not let you do stupid
thing like getting the database connection encoding wrong. It's just a
question of fixing the de-double-encoding-hack in osmosis. It doesn't
help that it's a *windows* encoding in the first step.

Have a nice day,
-- 
Martijn van Oosterhout <kleptog at gmail.com> http://svana.org/kleptog/




More information about the dev mailing list