[OSM-dev] Semicolon

Stefan Baebler stefan.baebler at gmail.com
Mon Nov 19 14:33:09 GMT 2007


Hi!

In a discussion with BrettH about Osmosis handling semicolons in
nodes' tags it struck me that tags of nodes
are kept in a text field in the nodes table(separated with semicolon),
while tags of ways are kept normalized - in a separate table.

Current escaping is sort of random, see node 100325036 for example.

in a planet's extract the node is written as
 <node id="100325036" timestamp="2007-11-06T20:38:08Z"
lat="46.9372873" lon="15.4481424">
   <tag k="name" v="Kasten"/>
   <tag k="place" v="hamlet"/>
   <tag k="created_by" v="Potlatch 0.4c"/>
   <tag k="is_in" v="Wundschuh"/>
   <tag k=";;Austria" v="bulk_upload.pl-f0deb1fc-2237-4d40-ae4d-3dd108453350"/>
 </node>

However API serves is differently, missing the last tag (with
semicolons) above completely.
http://www.openstreetmap.org/api/0.5/node/100325036 only gives:
 <node id="100325036" lat="46.9372873" lon="15.4481424"
user="atrejuvienna" visible="true"
timestamp="2007-11-06T20:38:08+00:00">
   <tag k="name" v="Kasten"/>
   <tag k="place" v="hamlet"/>
   <tag k="created_by" v="Potlatch 0.4c"/>
   <tag k="is_in" v="Wundschuh"/>
 </node>

In a dump made by osmosis the strange tag was written a bit nicer, as:
   <tag k="Austria" v="bulk_upload.pl-f0deb1fc-2237-4d40-ae4d-3dd108453350"/>
but might be that author actually wanted something completely diferent, eg:
   <tag k="is_in" v="Wundschuh;;;Austria"/>

While escaping is doable i believe that we should look into moving
nodes' tags into a separate table, both for avoiding escaping, uniform
handling of tags of ways and nodes, and perhaps even better indexing.

Stefan

On Nov 13, 2007 4:21 PM, Dave Stubbs <osm.list at randomjunk.co.uk> wrote:
> On 13/11/2007, Tom Hughes <tom at compton.nu> wrote:
> > In message <a4c775140711130535n512c9cbat432d47f7aeb27597 at mail.gmail.com>
> >         Dave Stubbs <osm.list at randomjunk.co.uk> wrote:
> >
> > > Patch attached.
> >
> > Thanks.
> >
> > > What it does:
> > >  - escapes all ';' with '\s'
> > >  - escapes all '=' with '\e'
> > >  - escapes all '\' with '\\'
> > >  - unescapes for requests
> > >  - works for API and Potlatch
> >
> > I can't say I'm a great fan of that escaping scheme, but I'll have
> > a think and see if I can come up with anything better.
>
> it lets you split on ; and = before you have to do anything else. And
> it's unambiguous. I'm not sure what else an escaping scheme should do?
> Of course it might be better not to have escaping at all, and do tags
> properly as for ways/relations...
>
> >
> > > So basically anything that doesn't use the API (either node interface,
> > > or amf interface) will fail to properly escape or unescape the ; and
> > > =.
> >
> > Do we have any idea what the performance impact on the map API call
> > is? That's an awful lot of (slow) ruby string manipulation that we're
> > adding to it...
>
> On my box it doesn't appear to be noticeable.
>
> I ran a test to insert 200 new nodes* using lwp-request, a bash for
> loop and time. I ran it 4 times, the first one being straight after
> starting the server in production mode.
>
> the real time for the old code was: 17.948 17.825 17.887 17.881
> for the new code: 18.316 17.897 17.876 17.825
>
> Obviously YMMV as on my box the db will be comparatively rubbish.
>
> Dave
>
> * The node defined the following tags:
> <tag k="somekey1" v="opt1;opt2"/>
> <tag k="somekey2" v="opt1=opt2"/>
> <tag k="somekey3" v="opt1\opt2"/>
> <tag k="amenity" v="fuel"/>
> <tag k="highway" v="roundabout"/>
> <tag k="source" v="thing:9899809597994"/>
> <tag k="thing:id" v="XX:9899809597994"/>
>
>
> _______________________________________________
> dev mailing list
> dev at openstreetmap.org
> http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
>




More information about the dev mailing list