[OSM-dev] pbf2osm development has started [code to test!]
osm at inbox.org
Thu Sep 30 00:47:50 BST 2010
On Wed, Sep 29, 2010 at 6:47 PM, Stefan de Konink <stefan at konink.de> wrote:
> On Thu, 30 Sep 2010, Frederik Ramm wrote:
>> Speaking of "polished": The program currently produces invalid XML because
>> " and & are not escaped, leading to lines like
> Yes, Roeland pointed that out as well yesterday. We have discussed an escape
> table. Maybe first parsing the entire string table, alternatively doing it
> for each instance.
In addition to " and &, you need to escape <. planet.c also escapes
>. It uses character references for each (", &, <, and
>). planet.c also escapes carriage return, line feed, and tab, as
, and . AFAICT it is legal to include these unescaped
(though it would be nice to escape at least line feeds to make it
easier on fast, non-XML-compliant parsers).
Now, finally, there are characters in the db which cannot be
represented in XML 1.0 (but can be represented in XML 1.1). Most
significantly, control characters (ASCII less than 32) other than
carriage return, line feed, and tab. Some versions of planet.c
convert these into ?. Some versions omit them completely. At least
one version converts them into &#ASCII;, where ASCII is the ASCII
code. I actually like the last version the best, though it is invalid
in XML 1.0 (valid in XML 1.1). Personally I'd recommend producing XML
1.1, at least as an option, in order to include these characters.
I don't believe there are any null characters in the database. These
could not be represented in XML 1.0 nor XML 1.1.
More information about the dev