[OSM-dev] Odd data in daily diffs (potlatch related?)

Martijn van Oosterhout kleptog at gmail.com
Tue Apr 1 07:37:04 BST 2008


On Tue, Apr 1, 2008 at 12:53 AM, Jon Burgess <jburgess777 at googlemail.com> wrote:
>  That does not work, the character is invalid even when encoded as an
>  entity:
>
>  $ xmllint -noout tmp.osm
>  tmp.osm:4: parser error : xmlParseCharRef: invalid xmlChar value 27
>     <tag k="type" v=""/>

Wow. Right. I looked up the XML definition and got:
http://www.w3.org/TR/REC-xml/#charsets
In particular:

Char	   ::=   	#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
[#x10000-#x10FFFF]

So *any* character <32 that's not a tab, newline or linefeed is
illegal in XML, not matter how you encode it. Since we have decided
that XML is the standard format for transport of OSM data, this means
we must ban those character now and forever in all layers and complain
loudly to any program generating such characters...

So brett, looks like you were doing the right thing after all, but the
failure mode could have been clearer.

Have a nice day,
-- 
Martijn van Oosterhout <kleptog at gmail.com> http://svana.org/kleptog/




More information about the dev mailing list