[OSM-dev] Mixed character encoding in planet.osm - plan for fixing it

Erik Johansson erjohan at gmail.com
Mon Nov 6 13:50:21 GMT 2006


On 11/5/06, Jonas Svensson <jonass at lysator.liu.se> wrote:
> On 4 Nov 2006 at 12:28, Ralf Zimmermann wrote:
>
> > Over the last weeks, several people have found out that the character
> > encoding in the planet.osm files is not fully valid UTF-8.
> >
> > I would like to clean up this mess.
> > Let's start with some thoughts on the data storage, input and output.
>
> I have done some testing with the new planet-061105.osm.bz2 and
> some other tools. To me it seems like the data in the database now
> is correct but the data exported to the planet dump is broken. For
> example if I look at node 2385021, it is broken in the dump but is
> correct if you download it in your browser using the api or
> download and look at it using JOSM. The broken name for that node
> in the dump is "Handelsh�jskole Syd" (character code F8, would be
> correct if encoding had been iso latin-1), JOSM and others says
> "Handelshøjskole Syd" (character code C3 B8) which is correct UTF-
> 8. Please tell me if this analysis is faulty.

Yes API download gives UTF-8 and planet dump gives latin1

-- 
/Erik


More information about the dev mailing list