[Talk-us] tool to detect invalid wikipedia/wikidata tags

Mateusz Konieczny matkoniecz at tutanota.com
Wed Feb 17 15:38:16 UTC 2021


Oh, this one was caused by Cebuano Wikipedia and Wikidata. I fixed it.

Cebuano Wikipedia is Wikipedia generated by bots from various databases,
with basically no human created articles, Wikidata bots in turn created millions
of duplicates with separate wikidata ids for cases like
- mountain as described in English Wikipedia 
https://www.wikidata.org/w/index.php?title=Q96379884&oldid=1358126289

- mountain as described in Cebuano bot-generated Wikipedia
https://www.wikidata.org/w/index.php?title=Q49033378&oldid=1343663546

Not sure how someone added English Wikipedia article and Cebuano Wikidata id,
but I fixed it by editing Wikidata and deleting local cache for this entry so should disappear
on next rerun (links above lead to old versions).

Basically example of consequences of bots running without permission.

Feb 17, 2021, 15:34 by daveswarthout at gmail.com:

> I have resolved most of these errors and left a Comment on your Note about Fort Wainwright.
> I'll recheck the issue about Nunatak, the one I reported on yesterday.  But it'll have to wait until tomorrow.
>
> Thanks again Mateusz
>
> On Wed, Feb 17, 2021 at 7:25 PM Mateusz Konieczny via Talk-us <> talk-us at openstreetmap.org> > wrote:
>
>> I rerun it for Alaska[1]
>>
>> https://matkoniecz.github.io/OSM-wikipedia-tag-validator-reports/Alaska,%20USA.html
>> right now lists multiple issues, and from quick check most of them seem valid
>> (including some that were previously not displayed at all)
>>
>> Which one is unclear/incorrect?
>>
>> BTW, is it possible that 
>> https://www.openstreetmap.org/relation/7367269
>> https://www.openstreetmap.org/relation/7397686
>> are one entity (or that smaller is actually a separate 
>> object)?
>>
>> I created >> https://www.openstreetmap.org/note/2542156
>> after trying official website (down for me) and archive copy at IA
>> and Wikipedia article and failing to find info allowing to verify it.
>>
>> Maybe someone here knows where it can be checked?
>>
>> [1] and for some other places, but full run is still going on - I will
>> contact people who send messages once their data is ready
>>
>> Feb 16, 2021, 03:02 by >> daveswarthout at gmail.com>> :
>>
>>> Thanks for the information, Mateusz
>>>
>>> I corrected all but one of the errors in Alaska. There was one that looked okay. When you run the tool again, it will show up. I'm not sure why the tag was marked as erroneous. 
>>>
>>> AlaskaDave
>>>
>>> On Tue, Feb 16, 2021 at 2:29 AM Mateusz Konieczny via Talk-us <>>> talk-us at openstreetmap.org>>> > wrote:
>>>
>>>> https://matkoniecz.github.io/OSM-wikipedia-tag-validator-reports/
>>>>
>>>> Please let me know (
>>>> https://www.openstreetmap.org/message/new/Mateusz%20Konieczny>>>>  is 100%
>>>> fine and will not spam mailing list ) if you want also your area,
>>>> something is broken or anything at all is unclear.
>>>>
>>>> For start: if you are not interested in wikipedia/wikidata tags you can
>>>> likely ignore this, there are also more interesting/important parts of
>>>> mapping OSM.
>>>>
>>>> But if you are interested in linking to Wikipedia, as it allows getting
>>>> description, illustrations, detecting interesting objects and so on
>>>> then this tool may be interesting for you.
>>>>
>>>> Right now Alabama, Alaska, Arizona, Arkansas, Colorado, California are
>>>> processed in USA for 206 found errors that are worth human time (ones
>>>> not worth human time and fixable with a bot are skipped).
>>>>
>>>> California and completed Rhode Island are run based on requests,
>>>> remaining USA areas are run for reasons no longer clear to me - and I
>>>> thought about disabling them, but maybe it will be useful for
>>>> some of you. 
>>>>
>>>> This tool is an unexpected result of creating a detector of interesting
>>>> places based on OSM Data and Wikipedia. It turned out to require a
>>>> filter to avoid invalid links.
>>>>
>>>> As detected links can be often fixed and it is better to remove invalid
>>>> rather than keep them, I am sharing this reports.
>>>>
>>>> This tool is outgrowth of validation checker in script intended to run
>>>> on small datasets, so not entire world is processed. Let me know if you
>>>> want more areas. If existing ones are useful I will notice it as error
>>>> count will start going down :)
>>>>
>>>> _______________________________________________
>>>> Talk-us mailing list
>>>> Talk-us at openstreetmap.org
>>>> https://lists.openstreetmap.org/listinfo/talk-us
>>>>
>>>
>>>
>>> -- 
>>> Dave Swarthout
>>> Homer, Alaska
>>> Chiang Mai, Thailand
>>> Travel Blog at >>> http://dswarthout.blogspot.com
>>>
>>
>> _______________________________________________
>>  Talk-us mailing list
>>  >> Talk-us at openstreetmap.org
>>  >> https://lists.openstreetmap.org/listinfo/talk-us
>>
>
>
> -- 
> Dave Swarthout
> Homer, Alaska
> Chiang Mai, Thailand
> Travel Blog at > http://dswarthout.blogspot.com
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk-us/attachments/20210217/b1e6861c/attachment-0001.htm>


More information about the Talk-us mailing list