<div dir="ltr"><div>Thanks for the feedback, Greg!<br><br>The purpose of this exercise (match MassGIS points to OSM streets) was to find MassGIS points that are obviously mis-placed.<br>As it turned out, the MassGIS points might be "mis-placed" either because MassGIS data are wrong or (and this second reason so far looks more likely) because many streets in OSM do not have names (or have wrong names -- these cases need scrupulous checks).</div><div><br></div>So, an easy take-away from this exercise is to add names to unnamed streets -- the resulting shp-files give us an idea on what streets in OSM are currently w/o names and what names they most likely should have. <div><br><div>Fuzzy match is used to filter the most severe discrepancies. I wrote the code with the exact match first, but it gave us too many points to check manually and most of those points were with relatively small discrepancies (abbreviations, spelling errors, etc. -- hopefully, these can later be corrected automatically).<br><br>For blanks and "'" symbols -- they are a quite frequent reason of mismatches: "Miller's" vs "Millers", "Mac Arthur" vs "MacArthur", "Hill Top" vs "Hilltop".<br><br>The matches were also based on distance. So, if there are "First Street" and "First Avenue" in the same town, yet, they are not both within 10 nearest streets to a given point, they will not be mixed.</div></div></div><br><div class="gmail_quote"><div dir="ltr">On Fri, Aug 10, 2018 at 3:41 PM Greg Troxel <<a href="mailto:gdt@lexort.com">gdt@lexort.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
Yury Yatsynovich <<a href="mailto:yury.yatsynovich@gmail.com" target="_blank">yury.yatsynovich@gmail.com</a>> writes:<br>
<br>
> I would suggest that we add/correct names of the streets (350 towns, 5-10<br>
> streets in each town -- sounds doable for manual edits), re-run the fuzzy<br>
> matching code again and whatever MassGIS points are marked as problematic<br>
> after that -- will be inspected individually.<br>
<br>
That's interesting that it's fewer when you allow some fuzz.<br>
<br>
When you say "add/correct", I don't really follow this. We can't make<br>
large-scale changes based on a data set without import/mechanical-edit<br>
approval. We don't really know that what's in the address dataaset is<br>
right, vs<br>
<br>
- what was in OSM (from the previous roads import, or from hand<br>
editing), vs<br>
- what's in the current roads layer, vs<br>
- what's in the current L3 Parcels layer, vs<br>
- what the local people and government call it, vs<br>
- what's on the road signs<br>
<br>
In looking at the one example I mentioned earlier (on the Cape), it was<br>
highly unclear what ought to be, and overwriting with one database what<br>
came from another seems messy.<br>
<br>
I don't have a problem with expanding abbreviations semi-mechanically;<br>
while that technically needs mechanical edit approval, it's a normal<br>
thing to do and we are the locals.<br>
<br>
So, please don't say "correct" without addressing the basis for making<br>
changes and why it's an ok thing to do. In particular, we cannot assume<br>
that the Master Address Database is an unerring source of truth.<br>
<br>
If you mean "flag this street as having conflicting data and ask locals<br>
to look into it and really figure out the right answer", that's totally<br>
fine of course. But it's not armchair work.<br>
<br>
Also, there is a notion that if an address (on a building) in OSM has an<br>
addr:street that doesn't match a nearby road, some apps will not deal<br>
with it. That's not a reason to put things in the DB that aren't right.<br>
It is entirely possible that one town department has assigned a street<br>
name with one value and a different town department has assigned an<br>
address with a name that is different. If so, we should probably enter<br>
it that way. This is of course messy and I'm open to discussion, but<br>
"App X chokes if property Y doesn't hold" does not lead to "we must make<br>
property Y hold in the DB, even if it isn't really true".<br>
<br>
As Jason said earlier, I think we should be taking the approach of<br>
identifying the subset of data that can be imported without difficulty,<br>
and doing that, and then working on the complicated stuff, which will<br>
take how long it takes.<br>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">Yury Yatsynovich</div>