[Talk-us] Blocking of user "WorstFixer" for removing ele=0 etc

Toby Murray toby.murray at gmail.com
Tue May 15 23:21:06 BST 2012


On Tue, May 15, 2012 at 4:24 PM, Alan Mintz
<Alan_Mintz+OSM at earthlink.net> wrote:
> At 2012-05-13 02:49, Frederik Ramm wrote:
>>
>> Removing ele=0 from objects is, in my opinion, totally unnecessary;
>
>
> And maybe incorrect, as ele=0 means we know the elevation is 0, while no ele
> tag means we do not know the elevation.

I just did a check on the data. Turns out most of the nodes with ele=0
are probably correct. Most of them are imports from GNIS and are
coastal features. Except in Arizona where there are 300 GNIS nodes
with ele=0 most of which are probably not at sea level.

On the other hand the most popular ele=* tag is ele=0.0 which is
overwhelmingly found on ways. (42,000 of them) This seems to be from
some bad imports where an undefined elevation in the source or some
other misunderstanding seems to have been translated to 0.0 in the OSM
upload. One of the bigger offenders seems to be an NHD import in
western Colorado with over 13,000 ways tagged with ele=0.0 - yes,
that's western Colorado where the Rockies are. Nowhere near sea
level... In this case I would actually argue that it should maybe be
cleaned up as import maintenance.


>>  like created_by, over which WorstFixer made a similar fuss, such
>> information could be removed where an object is touched for some other
>> reason but I don't see why it would have to be mass-removed.
>
>
> The reason for this may not be obvious to some. I assume it's because we
> store history of all objects, and it's a waste of space, not to mention
> bandwidth and processing resources to push the changes out to the mirrors,
> for almost no benefit. I just add "created_by=''" to my JOSM presets (or
> maybe it does this automatically now) so I clean it up when performing other
> edits.
>
>>  Even so, a mass-removal would be ok if proposed, discussed, and accepted
>> by the community like we expect everyone to; it's not ok to just do it on
>> your own and see if someone notices.
>
>
> Yes. Having said all that, OSMTI says there are 23 million nodes (33% of the
> total) with created_by tags! This seemed surprisingly high to me.

Err last time I checked we had over 1 billion nodes. So 2% not 33.

>
> I retrieved nodes from 300 random 0.1x0.1 degree bboxes. Of those, only 37
> returned any nodes at all**. All but 6 of those areas had no "created_by"
> tags on their nodes. Of those, only 2 were significant in percentage*, both
> in Norway.
>
> #137 had 1558 nodes, 801 of which (51%) have created_by tags.   BLTR: 68.137
>    13.766  68.237  13.866
> #264 had 2297 nodes, 1946 of which (85%) have created_by tags. BLTR: 60.787
>     4.900   60.887  5.000
>
> In #137, they are mostly tagged:
>    <tag k="created_by" v="JOSM"/> (TI says this makes up 63% of the values)
>
> In #264, they are mostly tagged:
>    <tag k="created_by" v="almien_coastlines"/> (TI says this makes up 10% of
> the values)
>    <tag k="source" v="PGS(could be inacurately)"/>
>
>
> My questions are:
>
> 1. Would removing the created_by from 33% of the nodes in the database save
> significant storage space, dump size, backup time, etc.?
>
> 2. Is it possible to remove these in bulk from the database without having
> to keep the history, push those diffs to mirrors, etc.? Do the mirrors
> occasionally start fresh from a new dump? Or can they run the same bulk
> purge? Or do I overestimate the necessity of doing it this way (and we can
> just clean it up with the regular tools and processes)?

Not even the license change bot is going to completely delete/hide
history and I think it is going to be the biggest automated change in
the history of the project. It will cause some parts of the history to
be hidden from public view but they will continue to exist in the
database. Makes me wonder... how many created_by tags are going to be
nuked by the license change bot? :)

Toby



More information about the Talk-us mailing list