[Talk-us-massachusetts] A simple check for addresses before the import, iteration #2
Yury Yatsynovich
yury.yatsynovich at gmail.com
Fri Aug 10 16:04:33 UTC 2018
Greetings!
I've modified my code so that now it does some fuzzy matches between OSM
streets and MassGIS addresses and marks as problematic only those MassGIS
point that do not pass this fuzzy match.
Details on the steps implemented for fuzzy matches:
1) the code expands abbreviations in OSM streets' names like "Str", "Ln",
etc. to "Street", "Lane", etc.
2) the status parts at the end of the streets' names (like "Street",
"Road", "Lane") are dropped. So "Sunset Street" and "Sunset Drive" turn
into just "Sunset"
3) the code converts OSM and MassGIS street names to upper case.
4) the code removes symbols like ".", "'", "," and blanks
5) the code considers similar strings (up to 90% similarity) as the same
E.g., if OSM has "New Miller's Street", while MassGIS has nearby address
points with "NEW MILLER ROAD", the above mentioned steps will convert the
streets' names into "NEWMILLERS" and "NEWMILLER" and consider them as the
same. For more details, please, see
https://github.com/yyatsyn/MassGIS-address-import/blob/master/import_addresses_fuzzy_match_names_work_in_progress.py
.
The resulting files are in the folder:
https://mega.nz/#F!79Ny3KKL!JemAt7yZKSUctrza8QU4Tg
The fuzzy match shows that there are not that many severe problems: around
300 points and 400 buildings with addresses in OSM need some attention
(comparing to 1 and 2K when using exact matches for streets' names), as
well as, maybe, 5-10 streets per town are found to need corrections after
being compared to MassGIS (mostly those are the streets without names or
with some extra words like "Main Street Extension" or "East Main Street" vs
"Main Street").
I would suggest that we add/correct names of the streets (350 towns, 5-10
streets in each town -- sounds doable for manual edits), re-run the fuzzy
matching code again and whatever MassGIS points are marked as problematic
after that -- will be inspected individually.
Any feedback is more than welcome!
--
Yury Yatsynovich
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk-us-massachusetts/attachments/20180810/88a78d69/attachment.html>
More information about the Talk-us-massachusetts
mailing list