[OSM-dev] minute diff utf8 broken 200907211259-200907211300.osc.gz
Martijn van Oosterhout
kleptog at gmail.com
Sat Aug 1 21:55:54 BST 2009
2009/7/31 Etienne Chové <chove at crans.org>:
> Hi,
>
> I still have some problem with utf8 in minute diff (same cause than
> [1]). What should I do ?
> - update osm2pgsql ? (I'm using SVN version 0.66-16239M)
> - correct diff manually ?
>
> I didn't find he solution in mail archive (I maybe didn't spend enough
> time to find it).
The problem is (as I uderstand it) that while XML uses Unicode, not
all unicode character are valid in XML. In particular any control
character (ASCII < 32) other than tab, newline and carraige return are
forbidden. What has happened here is that some control character has
slipped in.
I noticed because the Java XML parser in Osmosis barfed on it too. I
think recent versions of Osmosis hack around it.
A few options spring to mind:
- Fix the diff generater to strip control characters
- Strip the characters yourself with a sed script.
- Fix osm2pgsql to strip control characters
Have a nice day,
--
Martijn van Oosterhout <kleptog at gmail.com> http://svana.org/kleptog/
More information about the dev
mailing list