[Talk-us-massachusetts] A simple check for addresses before the import, iteration #2
Greg Troxel
gdt at lexort.com
Fri Aug 10 19:49:53 UTC 2018
Yury Yatsynovich <yury.yatsynovich at gmail.com> writes:
> I've modified my code so that now it does some fuzzy matches between OSM
> streets and MassGIS addresses and marks as problematic only those MassGIS
> point that do not pass this fuzzy match.
That's interesting. I guess some of this fuzz is in the "close enough" category
and some of it isn't.
> Details on the steps implemented for fuzzy matches:
> 1) the code expands abbreviations in OSM streets' names like "Str", "Ln",
> etc. to "Street", "Lane", etc.
I have the impression we are somehow going to fix that. Perhaps by a
full-scale mechanical edit. (Once there is code for mport to
postgis/process/produce-changeset-file, this will be easy.)
> 2) the status parts at the end of the streets' names (like "Street",
> "Road", "Lane") are dropped. So "Sunset Street" and "Sunset Drive" turn
> into just "Sunset"
This seems unsound, if on the path to import rather than just
prioritizing issues. There are towns where there is "Foo Drive" and
"Foo Road", and they are different streets in different parts of town.
So I see those as different names.
> 3) the code converts OSM and MassGIS street names to upper case.
OK for matching, but we know the MAD names are wrong (in that street
names have case, or at least we believe that?). But considering it a
match if the upcased osm value is the same, and then planning to use the
osm value instead for the address point makes sense.
> 4) the code removes symbols like ".", "'", "," and blanks
There's a separate QA question: how often are there symbols and spurious
blanks:
- in the MassGIS MAD
- in the existing OSM data
?
Do any of those uses make sense? Are they just errors?
> 5) the code considers similar strings (up to 90% similarity) as the same
>
> E.g., if OSM has "New Miller's Street", while MassGIS has nearby address
> points with "NEW MILLER ROAD", the above mentioned steps will convert the
> streets' names into "NEWMILLERS" and "NEWMILLER" and consider them as the
> same. For more details, please, see
> https://github.com/yyatsyn/MassGIS-address-import/blob/master/import_addresses_fuzzy_match_names_work_in_progress.py
That seems like a great filter (like merging road/street) to filter out
the things that are hopefully easy to fix from the things that we really
need to dig into, but it seems aggressive if we are transforming the MAD
to the OSM value based on a fuzzy match. If you are proposing to
allow importing an address point with the included street name if there
is a nearby close street name, but to use the MassGIS street name in
addr:street, that sounds ok, but I can't tell where you are departing
from "let's understand what we are dealing with" to "let's transform it
like this to produce an osc file to upload".
> The resulting files are in the folder:
> https://mega.nz/#F!79Ny3KKL!JemAt7yZKSUctrza8QU4Tg
Thanks - I will look over my town's data as soon as I can. It's small
enough that I can look at signs and talk to people at town hall about
anything I can't resolve.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 162 bytes
Desc: not available
URL: <http://lists.openstreetmap.org/pipermail/talk-us-massachusetts/attachments/20180810/97be3884/attachment.sig>
More information about the Talk-us-massachusetts
mailing list