[OSM-talk] Proposal to run an automated bot edit that will remove tracking parameters

Mateusz Konieczny matkoniecz at tutanota.com
Thu Feb 4 08:25:29 UTC 2021


No, as documented in the title it will remove solely tracking parameters,

Is there a substantial use of shortener links in tags that should be expanded
rather than removed?
Larger than say 200 links where manual processing is definitely
faster than implementing bot?

If yes - which ones?


Feb 4, 2021, 07:37 by e.marascalchi at gmail.com:

> Will this take care of link shortner (> bit.ly <http://bit.ly>>  goog.le ...) As well expanding the links and replacing it with the correct one?
>
> Il gio 4 feb 2021, 1:03 AM Oliver Simmons <> oliversimmo at gmail.com> > ha scritto:
>
>> Brilliant then 👍
>> I have zero negatives towards this in that case.
>>
>> On Wed, 3 Feb 2021, 22:24 Mateusz Konieczny via talk, <>> talk at openstreetmap.org>> > wrote:
>>
>>> It is definitely not removing all parameters just because one is tracking.
>>>
>>> Most cases are left for manual review, so will not be handled by automatic edit,
>>> but for example see >>> https://www.openstreetmap.org/node/2887372516>>>  that has
>>>
>>> website = >>> https://www.hammer-zuhause.de/maerkte/storeDetail?utm_campaign=googlemaps&utm_medium=organic&storeCode=0214&utm_source=uberall&utm_content=06366_K%C3%B6then_(Anhalt)
>>>
>>> that will be turned to:
>>>
>>> website = >>> https://www.hammer-zuhause.de/maerkte/storeDetail?storeCode=0214
>>>
>>>
>>> Feb 3, 2021, 23:07 by >>> oliversimmo at gmail.com>>> :
>>>
>>>> (sorry if you received this twice, forgot to do "reply to all" :/ )
>>>>
>>>> Just asking for clarification that this is only removing URL query sections recognised as tracking, and not the entire URL query.
>>>> The query is often used for, well, a query for a dynamic page.
>>>>
>>>> e.g.
>>>> https://www.example.com/search?tracking=yes&q=my%20search&tracking_id=12345
>>>> Should become
>>>> https://www.example.com/search?q=my%20search
>>>> NOT
>>>> https://www.example.com/search
>>>>
>>>>
>>>> On Tue, 2 Feb 2021, 21:21 Mateusz Konieczny via talk, <>>>> talk at openstreetmap.org>>>> > wrote:
>>>>
>>>>> tl;dr:
>>>>>
>>>>> I propose to run an automated bot edit that will remove tracking parameters, turning tags 
>>>>> such as
>>>>> website=>>>>> http://paris.intersquat.org/les-lieux/le-satellite/?fbclid=de58e340d6aa79a584552a2055042d004b9b19454bc0d7a6046fc81fc90f51
>>>>> into
>>>>> website=>>>>> http://paris.intersquat.org/les-lieux/le-satellite/
>>>>>
>>>>>
>>>>> I did it already before, see: >>>>> https://wiki.openstreetmap.org/wiki/Mechanical_Edits/Mateusz_Konieczny_-_bot_account/remove_tracking_parameters
>>>>> This edit will remove newly edited links and purge more tracking parameters.
>>>>>
>>>>> If anything will go wrong I will fix it.
>>>>> I have experience with automated edits, see
>>>>> https://wiki.openstreetmap.org/wiki/Mechanical_Edits/Mateusz_Konieczny_-_bot_account
>>>>>
>>>>> changes listing: >>>>> https://gist.github.com/matkoniecz/1d20caa198ec2d4001d95adf09123a8a
>>>>> (based on current OSM database, if OSM data changes then actual edit will be different,
>>>>> feel free to make backup of this linked file - it may be deleted some time after edit)
>>>>>
>>>>> source code and other documentation (except source code, duplicate of that posting):
>>>>> https://wiki.openstreetmap.org/wiki/Mechanical_Edits/Mateusz_Konieczny_-_bot_account/remove_tracking_parameters2/
>>>>>
>>>>> --------------------------------------
>>>>> full details:
>>>>>
>>>>> I propose to run a scripted edit - it was already run before but this will
>>>>> remove more tracking parameters.
>>>>>
>>>>> URL often have unnecessary parts, some added for tracking purposes
>>>>> by FB, Google and others.
>>>>> This tracking parameters should never appear in any osm tags.
>>>>>
>>>>> It means that it is beneficial to turn tag
>>>>> website=>>>>> http://paris.intersquat.org/les-lieux/le-satellite/?fbclid=de58e340d6aa79a584552a2055042d004b9b19454bc0d7a6046fc81fc90f51
>>>>> into
>>>>> website=>>>>> http://paris.intersquat.org/les-lieux/le-satellite/
>>>>> and it is worth doing it as an edit.
>>>>>
>>>>> This urls can be often fixed using an automated script, allowing to
>>>>> use human time on something more productive.
>>>>>
>>>>> Human-made edit will also result in changing "last edited by"
>>>>> (while not allowing to filter out such edits unlike marked bot edit),
>>>>> there are better ways to spot areas requiring fixes and we are not lacking
>>>>> places with QA indicators that manual review is needed.
>>>>>
>>>>> Usually tracking links are added by clueless people who just searched for
>>>>> a website and copied it from FB/Google.
>>>>>
>>>>> There are rare cases of links created to specifically track OSM users
>>>>> see for example
>>>>> * >>>>> https://www.openstreetmap.org/way/754704241/history
>>>>> ** >>>>> https://www.cronauerlaw.com/?utm_source=openstreetmap
>>>>> * >>>>> https://www.openstreetmap.org/node/1063808111/history
>>>>> ** >>>>> http://www.travelerscoffee.ru?utm_campaign=geo&utm_source=openstreetmap&utm_medium=link
>>>>> * >>>>> https://www.openstreetmap.org/node/6817678019/history
>>>>> ** >>>>> https://www.resotainer.fr/agence-bonneuil-sur-marne?utm_source=open-street-map&utm_medium=recherche-locale&utm_content=openstreetmap&utm_campaign=open-street-map-garde-meubles-bonneuil-sur-marne
>>>>> * >>>>> https://www.openstreetmap.org/node/1684317522
>>>>> ** >>>>> http://www.travelerscoffee.ru?utm_campaign=geo&utm_source=openstreetmap&utm_medium=link
>>>>>
>>>>> In general I have not noticed correlation between presence of tracking links
>>>>> and additional issues that would not be detected automatically.
>>>>>
>>>>> Therefore automatic removal of tracking parameters is not causing loss of
>>>>> useful indicators of areas that should be reviewed.
>>>>> Osmose and JOSM validators and StreetComplete are offering many better indicators,
>>>>> and we are not in danger of running out of places where human intervention is clearly needed.
>>>>>
>>>>> Automatic removal would allow me and others to spend time on something more useful,
>>>>> than reviewing all cases where tracking is clearly present and confirming them one by one.
>>>>>
>>>>> Proposed bot edit would remove links where all used parameters are tracking
>>>>> users and may be removed.
>>>>>
>>>>> I am reviewing manually more complicated cases to catch
>>>>> also currently unknown tracking parameters.
>>>>>
>>>>> Anchors (#section) will be preserved.
>>>>>
>>>>> Code is tested, was using it in a manual review mode and for a fully automated edit run
>>>>> that removed tracking parameters from over 1000 objects - see
>>>>> https://wiki.openstreetmap.org/wiki/Mechanical_Edits/Mateusz_Konieczny_-_bot_account/remove_tracking_parameters
>>>>>
>>>>> I have experience with automated edits, see
>>>>> https://wiki.openstreetmap.org/wiki/Mechanical_Edits/Mateusz_Konieczny_-_bot_account
>>>>>
>>>>> Yes, editing element will cause it to be edited and change "last edited" date.
>>>>> Effect will be exactly the same in case of using bot and manual edit
>>>>> (which would be necessary in case of rejecting this automated edit proposal).
>>>>> Note that in case of bot edits you may filter out bot edits marked as automatic.
>>>>>
>>>>> following are consider as tracking parameters and would be removed:
>>>>>
>>>>> fbclid, gclid, campaign_ref, mc_id, utm_source, utm_medium, utm_term,
>>>>> utm_content, utm_campaign, utm_id, gclsrc, dclid, wt.tsrc, WT.tsrc,
>>>>> zanpid, yclid, utm_campain, trkCampaign, mkt_tok, sc_campaign, sc_channel,
>>>>> sc_content, sc_medium, sc_outcome, sc_geo, sc_country, mbid, cmpid,
>>>>> campaign_id, Campaign, fb_action_ids, fb_action_types, fb_ref, fb_source,
>>>>> gs_l, _hsenc, igshid, CampIDMin, CampIDMaj, campaign, Campaign,
>>>>> campaignid, campaignId, adid, adgroupid, refr, referrer, cm_mmc, lw_cmp,
>>>>> CLID, ReferralSource, SourceID, trkid, adjust_creative, partner_slug, y_source,
>>>>> oppartnerid, padid, otppartnerid, ref_device_id, utm_kxconfid, SEO_id,
>>>>> originalReferrer, spMailingID, hsCtaTracking
>>>>> _______________________________________________
>>>>> talk mailing list
>>>>> talk at openstreetmap.org
>>>>> https://lists.openstreetmap.org/listinfo/talk
>>>>>
>>>
>>> _______________________________________________
>>>  talk mailing list
>>>  >>> talk at openstreetmap.org
>>>  >>> https://lists.openstreetmap.org/listinfo/talk
>>>
>> _______________________________________________
>>  talk mailing list
>>  >> talk at openstreetmap.org
>>  >> https://lists.openstreetmap.org/listinfo/talk
>>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk/attachments/20210204/681a6220/attachment-0001.htm>


More information about the talk mailing list