[Talk-ca] Canadian OSM POI quality
Paul Norman
penorman at mac.com
Mon Jun 9 04:57:15 UTC 2014
I was curious how complete OpenStreetMap shop data was, so decided to
do an analysis for some Canadian chains.
The results were mixed. Starting with a Canada extract, I processed the
data into PostGIS and ran queries against name, brand and franchise for
objects where amenity, office or shop was null. Brands which have recently
changed names (e.g. Zellers/Target) were avoided.
OSM completeness varied from 7% to 81%, with no overwhelming trends. The
four major fast food and restaurants chains considered ranged from 33%
to 51%. Shops opening and closing change the accuracy of these results,
and the accuracy of the external sources for number of shops may be
variable.
Method
======
The geofabrik extract for Canada was imported with osm2pgsql with a
custom .style file containing name, operator, brand, franchise, amenity,
building, office and shop columns. The last four columns caused an object
to be placed in the polygon table. After import, the tables were filtered
to remove rows where there was not an amenity, office, or shop tag.
Appropriate indexes were added. A view was created combining the two
tables and giving lower-case versions of the name, operator, brand and
franchise tags.
Queries of the following form were run
SELECT COUNT(*) FROM shops
WHERE lname LIKE :'name' OR lbrand LIKE :'name'
OR lfranchise LIKE :'name';
:'name' was substituted in by psql for what I was searching for. For
example, 'mcdonald%' for McDonald's. The queries used were intended to
catch all possible shops even if it resulted in false positives. Brand
selection was not done in any systematic manner.
Public sources were used for the true number of shops of a particular
chain, generally Wikipedia or public data aggregators.
Results
=======
OSM True Completenmess
Tim Horton's 1480 4304 34%
Subway 849 2563 33%
McDonalds 722 1417 51%
Starbucks 592 1363 43%
Both OSM and Google use both Starbucks and Starbucks Coffee
A & W 292 800 37%
Domino's 67 383 17%
Wendy's 224 369 61%
Burger King 150 281 53%
East Side Mario's 46 85 54%
Milestones 32 44 73%
Chili's 10 16 63%
Sears 105 1570 7%
Rona 122 500 24%
The Bay 50 421 12%
Walmart 265 382 69%
Home Depot 95 180 53%
Canadian Tire 400 491 81%
May be double-counting automative centers.
Chapters 47 233 20%
Sleep Country 22 179 12%
London Drugs 54 78 69%
May be double-counting some stores with a pharmacy inside
Remarks
=======
It took significantly longer to find the true number of stores than
to get results from the OSM data. Part of this is my increased familiarity
with OSM tools, but a large part is that it is not necessary to track
down many different sources to get store counts.
Although no urban/rural analysis was performed, it is generally expected
that OSM is more complete in populated urban areas than low-density rural
areas, and completeness in these urban areas are often more important
for many uses.
No proprietary data sources were available for comparison, but it should
not be assumed that they are any more complete, nor that their name or
similar tagging is any more consistent. As an example, Google's data was
observed to use both "Starbucks" and "Starbucks Coffee" for the coffee
chain, sometimes having both for what was really the same location.
The tools used to generate counts could easily be used to extract the
shop data to work with.
Improving the data
==================
Inconsistent tagging was observed with some shops, such as variability
between amenity=restaurant and amenity=fast_food. This should reflect
differences between locations but may not. Inconsistent names were also
observed, such as "Walmart", "Wal-mart", and "Wal Mart". These issues
are not as significant as the large percentage of missing shops.
More information about the Talk-ca
mailing list