[OSM-talk] taggued segments (was Re: API 0.5 is on the way)

Frederik Ramm frederik at remote.org
Wed Sep 12 00:57:29 BST 2007


Hi,

> > I think we should also work on taggued segments before the migration.
> 
> While not strictly necessary, it would of course tidy up things. I'll  
> write a script to run through the current planet file and analyze  
> segments 

[...]

The problem is smaller than I thought. Assuming that the tags
"created_by", "tiger:county", "tiger:upload_uuid", "converted_by", as
well as a "source" tag with the value "PGS" are "deletable" i.e. can
be dropped from a segment without much ado, we get the following
picture:

Segments total:              22,713,423
of these, untagged:         - 2,700,748 (no action required)
                            -----------
remaining problems:          20,012,674
of these, only tagged with
"deletable" tags:           -16,333,852 (drop the tags)
                            -----------
remaining problems:           3,678,822
of these, only tagged with
same tags as their ways:    - 3,534,141 (drop the tags)
of these, unwayed:          -    53,330 (create two-node ways)
                            -----------
remaining problems:              91,351 

The remaining 91,351 segments would make it into the 0.5 database
as two-node-ways running in parallel to another way that uses the same
nodes.

If you are interested you can retrieve the full list of

53,330 unwayed segments at http://openstreetmap.gryph.de/unwayed,txt.gz
91,351 tagged segments at  http://openstreetmap.gryph.de/tagged.txt.gz

The files contain one line for each, with segment id, segment tags,
and for the wayed segments also way id and way tags.

Everything in that file is at least unusual, often probably a mistake.
For example, we have ways tagged "waterway=riverbank" containing
segments that are tagged "natural=coastline"; we have about 24k segments
tagged "width=4" (there must have been a tool doing this at some point
in time?), and other funny things. 

The current implementation of our 0.4->0.5 conversion only drops
created_by and not the tiger:county, tiger:upload_uuid, source tags,
and AFAIK doesn't have the "drop tags that are in the way as well"
step; from these numbers here it is obvious that it would really make
sense to ignore the tags on the 3,5m segments that are tagged exactly
like the ways they belong to.

For the remaining 91k segments, I think we can proceed as suggested
and convert them into two-node ways; maybe adding an extra tag to them
that marks them for inspection? It would also be possible to try and
avoid that extra way and instead promote the segment's tags onto the
way it belongs to (unless that way is already tagged with the same
key).

Bye
Frederik

-- 
Frederik Ramm  ##  eMail frederik at remote.org  ##  N49°00.09' E008°23.33'





More information about the talk mailing list