[Talk-us-massachusetts] A simple check for addresses before the import, iteration #2

Yury Yatsynovich yury.yatsynovich at gmail.com
Fri Aug 10 20:25:54 UTC 2018


Thanks for the feedback, Greg!

The purpose of this exercise (match MassGIS points to OSM streets) was to
find MassGIS points that are obviously mis-placed.
As it turned out, the MassGIS points might be "mis-placed" either because
MassGIS data are wrong or (and this second reason so far looks more likely)
because many streets in OSM do not have names (or have wrong names -- these
cases need scrupulous checks).

So, an easy take-away from this exercise is to add names to unnamed streets
-- the resulting shp-files give us an idea on what streets in OSM are
currently w/o names and what names they most likely should have.

Fuzzy match is used to filter the most severe discrepancies. I wrote the
code with the exact match first, but it gave us too many points to check
manually and most of those points were with relatively small discrepancies
(abbreviations, spelling errors, etc. -- hopefully, these can later be
corrected automatically).

For blanks and "'" symbols -- they are a quite frequent reason of
mismatches: "Miller's" vs "Millers", "Mac Arthur" vs "MacArthur", "Hill
Top" vs "Hilltop".

The matches were also based on distance. So, if there are "First Street"
and "First Avenue" in the same town, yet, they are not both within 10
nearest streets to a given point, they will not be mixed.

On Fri, Aug 10, 2018 at 3:41 PM Greg Troxel <gdt at lexort.com> wrote:

>
> Yury Yatsynovich <yury.yatsynovich at gmail.com> writes:
>
> > I would suggest that we add/correct names of the streets (350 towns, 5-10
> > streets in each town -- sounds doable for manual edits), re-run the fuzzy
> > matching code again and whatever MassGIS points are marked as problematic
> > after that -- will be inspected individually.
>
> That's interesting that it's fewer when you allow some fuzz.
>
> When you say "add/correct", I don't really follow this.  We can't make
> large-scale changes based on a data set without import/mechanical-edit
> approval.  We don't really know that what's in the address dataaset is
> right, vs
>
>   - what was in OSM (from the previous roads import, or from hand
>     editing), vs
>   - what's in the current roads layer, vs
>   - what's in the current L3 Parcels layer, vs
>   - what the local people and government call it, vs
>   - what's on the road signs
>
> In looking at the one example I mentioned earlier (on the Cape), it was
> highly unclear what ought to be, and overwriting with one database what
> came from another seems messy.
>
> I don't have a problem with expanding abbreviations semi-mechanically;
> while that technically needs mechanical edit approval, it's a normal
> thing to do and we are the locals.
>
> So, please don't say "correct" without addressing the basis for making
> changes and why it's an ok thing to do.  In particular, we cannot assume
> that the Master Address Database is an unerring source of truth.
>
> If you mean "flag this street as having conflicting data and ask locals
> to look into it and really figure out the right answer", that's totally
> fine of course.  But it's not armchair work.
>
> Also, there is a notion that if an address (on a building) in OSM has an
> addr:street that doesn't match a nearby road, some apps will not deal
> with it.  That's not a reason to put things in the DB that aren't right.
> It is entirely possible that one town department has assigned a street
> name with one value and a different town department has assigned an
> address with a name that is different.   If so, we should probably enter
> it that way.   This is of course messy and I'm open to discussion, but
> "App X chokes if property Y doesn't hold" does not lead to "we must make
> property Y hold in the DB, even if it isn't really true".
>
> As Jason said earlier, I think we should be taking the approach of
> identifying the subset of data that can be imported without difficulty,
> and doing that, and then working on the complicated stuff, which will
> take how long it takes.
>


-- 
Yury Yatsynovich
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk-us-massachusetts/attachments/20180810/7852dd5f/attachment-0001.html>


More information about the Talk-us-massachusetts mailing list