[OSM-talk] Worldwide non-surveyed tag edits

Wed Jun 11 23:20:36 UTC 2014

> Date: Wed, 11 Jun 2014 23:28:06 +0200
> From: roland.olbricht at gmx.de
> To: rovastar at hotmail.com; talk at openstreetmap.org
> Subject: Re: [OSM-talk] Worldwide non-surveyed tag edits

> I'm glad that you got into discussion. 

No problem.

>Just some examples:
> 
> We have streets with housenumbers 3, 5, 9, 7, 11, 13
> Is it an obvious mistake? It's on purpose, because the housenumbers 
> sometimes are in that order on the ground.
> 
> We have in Germany cities with a street named "Cäcilienstraße" and 
> others with a street named "Cecilienstraße" (both with exactly the same 
> pronounciation, and both variants of the same surname).
.....

I am not sure how these examples apply to the cases I mentioned. Names are especially problematic and truly need some time and effort.
Luckily for most cases we have established tag values so it is not as open as that names. I know we can can tag anything but often there are established tags that are suitable.

> On the other hand, a mechanical change of data can be performed as easy 
> during postprocessing than in the database. This is known in programming 
> in "don't store an information when it is easier to recompute it".

I see that but the key thing is has to be easier to do.

Currently it would be very difficult to do.

Lets pick 1 simple established tag and value combination.

highway = residential
this denotes a residential road.
How many possible misspellings would you say there are for this that you would consider a typo/mistake?

Design an architecture where you postprocess every possible obvious misspelling (commercial speelchecker might miss some maybe you could write a new one) will I hope you agree adding an unnecessary overhead. Now multiple that by the thousands of popular established tag and value combinations and any modern computer solution become unfeasible.

I trust you understand this.

So a solution to is the old school way would be fixing the data which I could call "Garbage In, Garbage Out" maybe not as fashionable as postprocessing but often the best solution.

Now we do this by hand ourselves fixing the tags as we see them. If you suddenly saw your local road that was highway = residentail say disappeared from the rendering on openstreetmap.org and someone had changed it by mistake to highway = residentail you probably would correct this. However if that happened across town you might not notice at all.

Often cleaning the data you don't need to walk down the road or check from aerial imagery.

> You may earn real fame if you have a good filtering ruleset that 
> flatirons all suspect data. If you publish this as a postprocessing 
> script, it is useful. If you apply that to flatiron the database, in 99% 
> justified cases and 1% on otherwise on purpose crafted data, then you 
> will earn shame instead, because that same script could be perceived as 
> doing vandalism.

Not really interested in the fame of it but if I did manage to create a solution to the problem I stated above I will let you know, I'll let you share the fame with me.

There is a balance here the 99% vs 1%. Some would say that even if has 99.9999% it is not good enough and that just a single mistake would not make it worth it but for most it is about a balance.

Some of us want to fix this bad data and one way is doing these "mechanical edits" others are happy to leave the bad data as it is.

I am hoping there is a desire to reach a balance but it is unlikely.

> Date: Wed, 11 Jun 2014 23:28:06 +0200
> From: roland.olbricht at gmx.de
> To: rovastar at hotmail.com; talk at openstreetmap.org
> Subject: Re: [OSM-talk] Worldwide non-surveyed tag edits
> 
> Dear John,
> 
> I'm glad that you got into discussion. The OpenStreetMap community has 
> some consensus that look ouright nonsense from a computer scientist or 
> programmiers usual point of view. So it is helpful to explain every now 
> and then what is common sense, checking whether those decisions are 
> still valid.
> 
> > Consistent data is useful and typos and mistakes are common place.
> > Unifying these so they are machine readable so they are useful is, in
> > fact, useful.
> 
> Just some examples:
> 
> We have streets with housenumbers 3, 5, 9, 7, 11, 13
> Is it an obvious mistake? It's on purpose, because the housenumbers 
> sometimes are in that order on the ground.
> 
> We have in Germany cities with a street named "Cäcilienstraße" and 
> others with a street named "Cecilienstraße" (both with exactly the same 
> pronounciation, and both variants of the same surname).
> 
> The literal translation of connecting way into German is 
> "Verbindungsweg". This is also the offical name of a living street in 
> Siegburg, Germany.
> 
> By contrast, for good reason not connected in the database are these roads:
> http://blog.openstreetmap.de/blog/2013/05/wochennotiz-nr-147/
> 
> There was an automatic bot changing road names ending in "...strasse" to 
> "...straße" (means "... street" in German, second is the standard 
> spelling). This did fail both in Switzerland (where "...strasse" is the 
> authorized spelling) and on the name "Gleistrasse", which means "railway 
> track right of way" and only contains conincidentially the substring 
> "strasse".
> 
> There are probably more examples. They don't leave much space for 
> "obvious corrections" that are without doubt justified. That's why the 
> rule exists that mechanical edits are accepted unless somebody complains:
> 
> If nobody complains then the edit was a posteriori a correction of the 
> obvious. We have no a priori criterion for "obvious correction".
> 
> > The "rules" for mechanical edit are frankly ridiculous. Have you read them.
> 
> Our most valuable resource is not data but people who curate their share 
> of data. Changing data in a way that might be considered harmful or is 
> unintentionally outright wrong may shy away those who keep the data current.
> 
> The sometimes rude feedback was identified as a probable cause for 
> OpenStreetMap having few contributing women.
> 
> So correcting those obvious errors requires communication with the 
> mappers (male, female, or else) who have made these errors, in a way 
> that always at first encourages them to carry on mapping (hopefully with 
> less mistakes).
> 
> On the other hand, a mechanical change of data can be performed as easy 
> during postprocessing than in the database. This is known in programming 
> in "don't store an information when it is easier to recompute it".
> 
> You may earn real fame if you have a good filtering ruleset that 
> flatirons all suspect data. If you publish this as a postprocessing 
> script, it is useful. If you apply that to flatiron the database, in 99% 
> justified cases and 1% on otherwise on purpose crafted data, then you 
> will earn shame instead, because that same script could be perceived as 
> doing vandalism.
> 
> It's potentially feasible to postprocess data. It's hard to collect 
> data. So please don't make collecting data harder. Please make rather 
> postprocessing data easier.
> 
> Best regards,
> 
> Roland
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk/attachments/20140611/2dab4552/attachment-0001.html>