[OSM-dev] 04to05.pl doesn't like Puerto Rico
Milo van der Linden
mlinden at zeelandnet.nl
Thu Jan 17 17:53:35 GMT 2008
Welcome to the Spanish part of the world!
Things will even get nicer when you get to Russia or Greece ;-)
My guess is that the original file format was not UTF-8, you may have
converted it, but in the conversion process, the new file decided to
replace the ê ç and other characters.
I had the same problem with major cities of the world and ended up
getting everything into postGIS that was Forced to UTF-8 and then look
up the crazy characters and replace them by hand... sigh...
So I hope you run into a real solution, cause I would like to know too!
Dave Hansen schreef:
> Well, I've done virtually the entire US's TIGER data with the script,
> with no issues, but it finally choked on Puerto Rico.
>
> It gets this:
>
> not well-formed (invalid token) at line 330, column 38, byte 14569
> at /usr/local/lib/perl/5.8.8/XML/Parser.pm line 187
>
> when running on this file:
>
> http://dev.openstreetmap.org/~daveh/tiger.files/counties/PR/Adjuntas.osm
>
> I think it's the crazy characters in tags like this:
>
> <tag k="name" v="Carr Sillo de Calder�n"/>
> <tag k="tiger:name_base" v="Carr Sillo de Calder�n"/>
>
> Being a stupid American, I have no real knowledge of character sets and
> that fun. Any idea what the right way to fix this is?
>
> -- Dave
>
>
>
>
>
> _______________________________________________
> dev mailing list
> dev at openstreetmap.org
> http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
>
More information about the dev
mailing list