[Talk-us-massachusetts] A simple check for addresses before the import, iteration #2
Yury Yatsynovich
yury.yatsynovich at gmail.com
Sun Aug 12 04:26:28 UTC 2018
Hi Jason,
great job!
I've been going through towns in Berkshire, correcting errors in OSM and
writing down the OBJECTID of MassGIS points that might be considered as
errors to the Google Docs spreadsheet that I've created and shared earlier:
https://docs.google.com/spreadsheets/d/1BRMv2iwsg7ZMUiVwtP9JUD5xO8s98ucfVY_1F3DJDfc/edit?usp=sharing
Later it will be easier to collect all such points by their ID and exclude
them from the imported MassGIS data.
On Sat, Aug 11, 2018 at 9:01 PM Jason Remillard <remillard.jason at gmail.com>
wrote:
> Hi,
>
> 155 points in Littleton.
>
> The 3 SANDAS POINT points don't match the road name but seem correct.
> The 8 CRORY LANE addresses are wrong. There is no Crory lane, the points
> are over conservation land.
> The 1 LONGFELLOW DRIVE address is on conservation land, it is wrong.
> The WHITE HORSE ROAD address seems to be correct, yet doesn't match any
> roads.
> The 2 WESTVIEW ROAD addresses seem to not be developed yet, paper
> addresses.
> The COTTAGE WAY addresses seem to be correct, but the road wasn't
> developed.
> The BOATHOUSE WAY addresses seem to be correct, but the road wasn't
> developed.
>
> The rest of the points were errors in OSM, mostly missing roads and roads
> that had the wrong name.
>
> Except for VINT LANE (too new), the other points should be fixed in OSM.
>
> Jason
>
>
> On Fri, Aug 10, 2018 at 12:05 PM Yury Yatsynovich <
> yury.yatsynovich at gmail.com> wrote:
>
>> Greetings!
>> I've modified my code so that now it does some fuzzy matches between OSM
>> streets and MassGIS addresses and marks as problematic only those MassGIS
>> point that do not pass this fuzzy match.
>>
>> Details on the steps implemented for fuzzy matches:
>> 1) the code expands abbreviations in OSM streets' names like "Str", "Ln",
>> etc. to "Street", "Lane", etc.
>> 2) the status parts at the end of the streets' names (like "Street",
>> "Road", "Lane") are dropped. So "Sunset Street" and "Sunset Drive" turn
>> into just "Sunset"
>> 3) the code converts OSM and MassGIS street names to upper case.
>> 4) the code removes symbols like ".", "'", "," and blanks
>> 5) the code considers similar strings (up to 90% similarity) as the same
>>
>> E.g., if OSM has "New Miller's Street", while MassGIS has nearby address
>> points with "NEW MILLER ROAD", the above mentioned steps will convert the
>> streets' names into "NEWMILLERS" and "NEWMILLER" and consider them as the
>> same. For more details, please, see
>> https://github.com/yyatsyn/MassGIS-address-import/blob/master/import_addresses_fuzzy_match_names_work_in_progress.py
>> .
>>
>> The resulting files are in the folder:
>> https://mega.nz/#F!79Ny3KKL!JemAt7yZKSUctrza8QU4Tg
>>
>> The fuzzy match shows that there are not that many severe problems:
>> around 300 points and 400 buildings with addresses in OSM need some
>> attention (comparing to 1 and 2K when using exact matches for streets'
>> names), as well as, maybe, 5-10 streets per town are found to need
>> corrections after being compared to MassGIS (mostly those are the streets
>> without names or with some extra words like "Main Street Extension" or
>> "East Main Street" vs "Main Street").
>>
>> I would suggest that we add/correct names of the streets (350 towns, 5-10
>> streets in each town -- sounds doable for manual edits), re-run the fuzzy
>> matching code again and whatever MassGIS points are marked as problematic
>> after that -- will be inspected individually.
>>
>> Any feedback is more than welcome!
>> --
>> Yury Yatsynovich
>> _______________________________________________
>> Talk-us-massachusetts mailing list
>> Talk-us-massachusetts at openstreetmap.org
>> https://lists.openstreetmap.org/listinfo/talk-us-massachusetts
>>
>
--
Yury Yatsynovich
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk-us-massachusetts/attachments/20180812/328ccfdd/attachment-0001.html>
More information about the Talk-us-massachusetts
mailing list