[OSM-dev] minute diff utf8 broken 200907211259-200907211300.osc.gz

Martijn van Oosterhout kleptog at gmail.com
Sat Aug 1 21:55:54 BST 2009


2009/7/31 Etienne Chové <chove at crans.org>:
> Hi,
>
> I still have some problem with utf8 in minute diff (same cause than
> [1]). What should I do ?
>  - update osm2pgsql ? (I'm using SVN version 0.66-16239M)
>  - correct diff manually ?
>
> I didn't find he solution in mail archive (I maybe didn't spend enough
> time to find it).

The problem is (as I uderstand it) that while XML uses Unicode, not
all unicode character are valid in XML. In particular any control
character (ASCII < 32) other than tab, newline and carraige return are
forbidden. What has happened here is that some control character has
slipped in.

I noticed because the Java XML parser in Osmosis barfed on it too. I
think recent versions of Osmosis hack around it.

A few options spring to mind:
- Fix the diff generater to strip control characters
- Strip the characters yourself with a sed script.
- Fix osm2pgsql to strip control characters

Have a nice day,
-- 
Martijn van Oosterhout <kleptog at gmail.com> http://svana.org/kleptog/




More information about the dev mailing list