[Talk-us-massachusetts] A simple check for addresses before the import, iteration #2

Greg Troxel gdt at lexort.com
Fri Aug 10 20:38:07 UTC 2018


Yury Yatsynovich <yury.yatsynovich at gmail.com> writes:

> The purpose of this exercise (match MassGIS points to OSM streets) was to
> find MassGIS points that are obviously mis-placed.
> As it turned out, the MassGIS points might be "mis-placed" either because
> MassGIS data are wrong or (and this second reason so far looks more likely)
> because many streets in OSM do not have names (or have wrong names -- these
> cases need scrupulous checks).

That sounds great.  You are making a lot of progress understanding this
dataset's properties.

> So, an easy take-away from this exercise is to add names to unnamed streets
> -- the resulting shp-files give us an idea on what streets in OSM are
> currently w/o names and what names they most likely should have.

If done by a local mapper with some clue and on a think-per-street,
check other sources (L3 parcels, maybe look at signs), that sounds
fine.

> Fuzzy match is used to filter the most severe discrepancies. I wrote the
> code with the exact match first, but it gave us too many points to check
> manually and most of those points were with relatively small discrepancies
> (abbreviations, spelling errors, etc. -- hopefully, these can later be
> corrected automatically).

Great - as long as we are sorting issues by priority, and not thinking
fuzzy is ok, I'm with you.

> For blanks and "'" symbols -- they are a quite frequent reason of
> mismatches: "Miller's" vs "Millers", "Mac Arthur" vs "MacArthur", "Hill
> Top" vs "Hilltop".

I suppose then it's a really interesting question what's right.   In all
of these I lean to really figuring out some of them, so that we can
judge whether OSM data, MAD, L3 parcels, or massgis roads is most likely
to be correct.

> The matches were also based on distance. So, if there are "First Street"
> and "First Avenue" in the same town, yet, they are not both within 10
> nearest streets to a given point, they will not be mixed.

Sounds good, but I meant if there is street/avenue confusion, that's a
real issue to be sorted out.  But fine to defer it for another pass.  I
really would not expect much of this.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 162 bytes
Desc: not available
URL: <http://lists.openstreetmap.org/pipermail/talk-us-massachusetts/attachments/20180810/6b9166d2/attachment.sig>


More information about the Talk-us-massachusetts mailing list