[OSM-talk] Potential bot tasks relating to Wikidata errors

Mateusz Konieczny matkoniecz at tutanota.com
Wed Jul 13 12:01:59 UTC 2022




10 lip 2022, 16:14 od andy at pigsonthewing.org.uk:

> On Sun, 10 Jul 2022 at 13:57, Mateusz Konieczny via talk
> <> talk at openstreetmap.org> > wrote:
>
>> Based on my own similar efforts: Wikidata has large scale issues
>> with it classificaton and I would recommend manual tool-assisted changes.
>>
>
> Relevant example?
>
small sample:
- USS Niagara museum ship is classified as "group of humans"
- all objects marked as canals are classified as "non-physical entity"
- University of San Francisco is classified as an action

Overall, Wikidata classification system is not allowing to
reliably answer questions such as "is this an event" or "is it a physical object"or "is it ship or group of humans" or "is it physical or non-physical entity".

>> If anyone is interested in listing of broken Wikidata classifications and
>> is willing to deal with this quagmires I also would be happy to share examples.
>>
>
> Please post them at:
>
> https://www.wikidata.org/wiki/Wikidata:Project_chat
>
See 
https://www.wikidata.org/wiki/Wikidata:Project_chat#USS_Niagara_museum_ship_is_classified_as_%22group_of_humans%22
https://www.wikidata.org/wiki/Wikidata:Project_chat#canal_classified_as_%22non-physical_entity%22
https://www.wikidata.org/wiki/Wikidata:Project_chat#University_of_San_Francisco_classified_as_an_action

(this discussions are active, so far has not resulted in Wikidata being fixed,
will be likely archived at https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2022/07 )

And to be clear it is not some isolated case: marking object as museum on Wikidata
is indirectly classiffying it as "group of humans", every single object classified as a
canal is also indirectly classified as "non-physical entity" and what I reported there
as a test is only small sample of such problems.

I made mistake in past and I tried repairing such bogus classifications,
but other people on Wikidata kept reverting such edits or introducing new
bogus classification claims.

Overall: Wikidata has very low quality classification system, fully automatic
edits cannot rely on it being in any way reliable. I also recommend against 
spending time on fixing Wikidata.

Classifying world is complex and hard (see OSM tagging discussions) but
Wikidata is example of really dysfunctional system. What worse, its
dysfunctions are not clearly evident but make this dataset extremely unreliable.

If I sound irritated, that is because I am irritated.
I spend substantial amount of time in attempt to use it and it turned out to be
waste of time. I strongly advise against using it for classifying objects.

Unless you like dealing with system that classifies dry lakes, aqueducts, countries,
wind farms, dunes, trees, information boards, Hollywod sign, geoglyphs, cemeteries, expressways as events.

And train lines as transactions.

All this examples are real, though I fixed them on Wikidata - but many remain
undetected or were broken again,

Note: Wikidata may (or may be not) be usable for info attached directly to a given object.
But indirect classification is an utter failure.

I am against any bot editing OSM that would rely on this.
Human verified edits based on this may be useful (and I am doing them!)


>> Note that blind changing on all graves to to buried:wikidata would be
>> wrong, as some graves are symbolic.
>>
>
> Then they are not graves.
>
Maybe technically not, but "symbolic grave" appears to be in use.
Add to that people who are not native speakers of English...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk/attachments/20220713/d745a2c4/attachment.htm>


More information about the talk mailing list