[OSM-dev] Latest planet.osm contains incorrect data
Petter Reinholdtsen
pere at hungry.com
Thu Dec 7 09:34:29 GMT 2006
[Christopher Schmidt]
> Right. The data which is now 'wrong' in the dump is actually correct in
> the db. Data which is right in the dump is in the db as iso-8859-1, but
> the current planet exporter converts *everything* from latin-1 to
> utf-8... even if it was already right. (oops.)
I guess the intended behavior is to store UTF-8 in the database and
export UTF-8 in the dump. For that to work properly all the database
entries with ISO-8859-1 will need to be converted. It should be
possible to do it semi-automatic, by looking for all strings in the
database which are not valid UTF-8, and convert these from as if they
are ISO-8859-1 to UTF-8. I suspect the java applet need to be
modified to handle string encoding correctly as well, but have not
verified this.
Friendly,
--
Petter Reinholdtsen
More information about the dev
mailing list