[OSM-dev] Semicolon

Tom Hughes tom at compton.nu
Mon Nov 19 15:10:19 GMT 2007


In message <53cf5b6f0711190633g4d99a1bbk7b9fbae3be2b0b72 at mail.gmail.com>
        Stefan Baebler <stefan.baebler at gmail.com> wrote:

> Hi!
>
> In a discussion with BrettH about Osmosis handling semicolons in
> nodes' tags it struck me that tags of nodes
> are kept in a text field in the nodes table(separated with semicolon),
> while tags of ways are kept normalized - in a separate table.
>
> Current escaping is sort of random, see node 100325036 for example.

The current escaping is not random at all - there is no escaping!

Well that's not quite true - Potlatch uses a (broken) form of escaping
but as it is the only thing that undoes that escaping on read it is
not hugely helpful.

> in a planet's extract the node is written as
>  <node id="100325036" timestamp="2007-11-06T20:38:08Z"
> lat="46.9372873" lon="15.4481424">
>    <tag k="name" v="Kasten"/>
>    <tag k="place" v="hamlet"/>
>    <tag k="created_by" v="Potlatch 0.4c"/>
>    <tag k="is_in" v="Wundschuh"/>
>    <tag k=";;Austria" v="bulk_upload.pl-f0deb1fc-2237-4d40-ae4d-3dd108453350"/>
>  </node>
>
> However API serves is differently, missing the last tag (with
> semicolons) above completely.
> http://www.openstreetmap.org/api/0.5/node/100325036 only gives:
>  <node id="100325036" lat="46.9372873" lon="15.4481424"
> user="atrejuvienna" visible="true"
> timestamp="2007-11-06T20:38:08+00:00">
>    <tag k="name" v="Kasten"/>
>    <tag k="place" v="hamlet"/>
>    <tag k="created_by" v="Potlatch 0.4c"/>
>    <tag k="is_in" v="Wundschuh"/>
>  </node>
>
> In a dump made by osmosis the strange tag was written a bit nicer, as:
>    <tag k="Austria" v="bulk_upload.pl-f0deb1fc-2237-4d40-ae4d-3dd108453350"/>
> but might be that author actually wanted something completely diferent, eg:
>    <tag k="is_in" v="Wundschuh;;;Austria"/>

What's you're looking at is what happens to Potlatch's escaping when
it is processed by tools that don't understand it (or any other form
of escaping).

> While escaping is doable i believe that we should look into moving
> nodes' tags into a separate table, both for avoiding escaping, uniform
> handling of tags of ways and nodes, and perhaps even better indexing.

I agree that it should be moved out personally - any volunteers to
do the job?

Indexing is a double edged sword - we can get better indexing, but
the downside is that the tags table has to be a MyISAM table.

Tom

-- 
Tom Hughes (tom at compton.nu)
http://www.compton.nu/




More information about the dev mailing list