[Imports] Worldwide fuel stations import, down to 56k objects
ilya at zverev.info
Thu Mar 15 10:07:58 UTC 2018
Thank you for examining the fuel station dataset. Your feedback helped me make it better. I have updated the conflation script, the data processing profile, and sped up the validation website. Please have another look.
The new GeoJSON with changes is here: https://transfer.sh/RpeP2/fuel.json (38 MB).
Now to answer your questions.
I have updated the wiki page for NavAds imports, explaining the process and the source:
> Slow website
Spent a day on that, now the browsing page loads in 10 seconds instead of two minutes.
> Country statistics
In total this import will add or update fuel stations in 40 countries:
* add 11k and update 9.4k in US
* add 1.5k, upd 4.7k in France
* add 190, upd 5.2k in Germany
* add 611, upd 1.9k in Switzerland
* add 222, upd 2k in UK
And so on. Yes, there are countries with just a couple new stations versus hundreds of updates, like Czechia, Luxembourg, Russia.
In modified fuel stations, changes are mostly new tags. On 20% of modified objects (7679), some tags are updated, mostly:
* "opening_hours" (in France thousands of fuel stations have 24/7 corrected to actual opening hours)
* "phone" (both number updates and E.164 formatting)
* "brand" (mostly fixing names or changing capitalization: ARAL → Aral and AVIA → Avia, the latter was requested in talk-gb)
> Splitting the import into many smaller imports
I am still against that idea, not only because managing it is harder. There are 12 countries that have more than 900 objects changed. That number of separate imports is too much for a single person to manage.
Although I'd be thankful if mappers from France, South Africa, Turkey, Spain, Netherlands and all German-speaking countries posted in their local communities about the import.
> "website" tag contains generic URL, sometimes plain wrong
I've decided to skip this tag for importing. It adds little value, too many links to company websites and not to specific stations.
> Which other tags are affected?
Only four tags: "brand", "phone", "opening_hours" and "addr:postcode". Also "ref:navads" is added to every managed station.
> Duplicate points, seen as adding new stations on top of existing ones.
Improved the conflation script, found and removed 1342 of these. 1092 were straight duplicates, 260 were just too close to each other and had some matching tags. I have also reported these back to NavAds.
> Closed fuel stations being added
I have enabled validation just for these. Please mark them as absent, so that I can easily report them to NavAds after import. I don't expect even 1% of fuel stations to be validated, so the validation is only for marking them absent. You can use "edit this" link from the browsing mode, or draw a bounding box around you region in the profile and validate all POIs inside it.
I have marked all missing fuel stations reported in this topic already. Note that validation is still slow, you'll have to wait ~10 seconds after each submission.
> Reinstated "disused:amenity=fuel" in the UK
Fixed that, now disused fuel stations stay disused.
> Capitalization of "Avia" in the UK
Fixed that as well.
> Shops that were categorized as fuel stations
Simon reported a Coop store added as a fuel station, and I noticed a similar issue with a Carrefour Express. I added a filter by brand, which removed 268 points from the source dataset.
Alas, I cannot decide with certainty whether a POI is a fuel station with a shop, or a shop with a fuel station nearby. Please report such occurences, so I can filter more brands.
More information about the Imports