[OSM-talk] A large part of London Missing?
Ævar Arnfjörð Bjarmason
avarab at gmail.com
Tue Dec 9 13:06:21 GMT 2008
On Mon, Dec 8, 2008 at 5:30 PM, Ed Loach <ed at loach.me.uk> wrote:
> Can anyone work out how to find out what possible UTF8 errors there
> might be? Or would you be able to tell some other way if this were
> the problem?
You should try using decode("UTF-8") instead of decode("utf8") in your
checking routine. UTF-8 and utf8 are not equivalent under Encode, the
former is more strict, see
http://perldoc.perl.org/Encode.html#UTF-8-vs.-utf8-vs.-UTF8
encode("utf8", "\x{FFFF_FFFF}", 1); # okay
encode("UTF-8", "\x{FFFF_FFFF}", 1); # croaks
It would also be useful to log these errors, they suggest invalid
byte-sequences in the OSM dataset and it would be useful to fix them
at their source.
More information about the talk
mailing list