[OSM-dev] UTF-8 problems in informationfreeway?
stefan.baebler at gmail.com
Thu Dec 20 08:41:59 GMT 2007
So, the solution is to just provide a patch with more cases for escaping in
and hope they work fine?
It would of course be better in a long run to fix the main DB, but I'm
not sure what all this brings along. Probably a lot.
On Dec 19, 2007 10:36 PM, Brett Henderson <brett at bretth.com> wrote:
> Hi All,
> I've lost my home ADSL (won't line sync, tried two modems, tried different
> leads, doesn't seem to be my end) so I'm mostly offline. As a result I'm
> unlikely to get onto this issue in the short term. With Christmas
> approaching I'm bracing myself for a long'ish outage.
> If anybody wishes to take a look, the hacked character encoding class is
> named ProductionDbCharset and has two related classes named
> ProductionDbDataEncoder and ProductionDbDataDecoder.
> The classes are instantiated within BaseXmlWriter which is extended by the
> XmlWriter class for writing osm files and XmlChangeWriter for osc files.
> The hack works by just passing the doubly encoded data through the osmosis
> pipeline then fixing it before writing to xml.
> Not sure how easy it will be to fix without access to a doubly encoded
> database though.
> On 12/20/07, Martijn van Oosterhout < kleptog at gmail.com> wrote:
> > On Dec 18, 2007 1:04 PM, Stefan Baebler < stefan.baebler at gmail.com> wrote:
> > > I somehow assumed utf8 would be the default choice by now. Also
> > > http://wiki.openstreetmap.org/index.php/Database_schema
> > > mentions utf8 explicitly for every table individually.
> > >
> > > Why does main api work nicely then?
> > > Why are full planet dumps ok?
> > There's an encoding issue in that what the ruby server thinks it is is
> > different from what the datavase encoding actually is. The net result
> > is that the data is encoded *twice*. For example (not actual codes,
> > just examples):
> > Original char: character 0xef
> > Encoded as: 0xc3 0xaf
> > Stored as: 0xc0 0xc3 0xc0 0xbf
> > > And more importantly:
> > > How can same magic be used to get properly utf8 encoded hourly changes
> > Osmosis is in Java which is smart enough to not let you do stupid
> > thing like getting the database connection encoding wrong. It's just a
> > question of fixing the de-double-encoding-hack in osmosis. It doesn't
> > help that it's a *windows* encoding in the first step.
> > Have a nice day,
> > --
> > Martijn van Oosterhout <kleptog at gmail.com> http://svana.org/kleptog/
> > _______________________________________________
> > dev mailing list
> > dev at openstreetmap.org
> > http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
> dev mailing list
> dev at openstreetmap.org
More information about the dev