[OSM-dev] Non-UTF-8 German Umlauts in planet.osm

Robert (Jamie) Munro rjmunro at arjam.net
Fri Mar 16 00:09:36 GMT 2007

Hash: SHA1

Jan-Benedict Glaw wrote:
> On Thu, 2007-03-15 20:33:50 +0000, Artem Pavlenko <artem at mapnik.org> wrote:
>> On 15 Mar 2007, at 20:27, Jan-Benedict Glaw wrote:
>>> Current planet.osm has a sharp-s in (probably) ISO-8859-1{,5}, which
>>> breaks the PostGIS import:
> [...]
>>> Any chance to report (and in case of tags: drop) non-UTF-8 stuff
>>> during planet.osm generation?
>> You need to UTF8 sanitize planet first:
>> UTF8sanitize < planet.osm > planet-utf8.osm
> jbglaw at nini:~/planet.osm$ head -1 planet-070314.osm
> <?xml version="1.0" encoding="UTF-8"?>
> Will do.  ...and I'll try to fix the non-UTF-8 codes manually. And we
> shouldn't state that it's UTF-8 if it really isn't.

I think that it is UTF-8, it's just that some of it is a bit broken. In
other words to  say that it was another encoding would be more wrong.
Not specifying an encoding would also be wrong.

It's one of the million things that shouldn't happen because the back
end database shouldn't let it. We're all waiting for the rails port to
be finished so that the api can be fixed.

Robert (Jamie) Munro
Version: GnuPG v1.4.6 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org


More information about the dev mailing list