[OSM-dev] Latest planet.osm contains incorrect data

Petter Reinholdtsen pere at hungry.com
Thu Dec 7 09:34:29 GMT 2006


[Christopher Schmidt]
> Right. The data which is now 'wrong' in the dump is actually correct in
> the db. Data which is right in the dump is in the db as iso-8859-1, but
> the current planet exporter converts *everything* from latin-1 to
> utf-8... even if it was already right. (oops.)

I guess the intended behavior is to store UTF-8 in the database and
export UTF-8 in the dump.  For that to work properly all the database
entries with ISO-8859-1 will need to be converted.  It should be
possible to do it semi-automatic, by looking for all strings in the
database which are not valid UTF-8, and convert these from as if they
are ISO-8859-1 to UTF-8.  I suspect the java applet need to be
modified to handle string encoding correctly as well, but have not
verified this.

Friendly,
-- 
Petter Reinholdtsen





More information about the dev mailing list