[Talk-us-massachusetts] Update on the MassGIS address import effort

Yury Yatsynovich yury.yatsynovich at gmail.com
Fri Apr 12 02:59:23 UTC 2019


Greetings!

As I've recently had a bit of time to devote to this import, here is a
quick update (links to the mentioned below Mega and Dropbox folders as well
as the Google Docs spreadsheet and the code files are on
https://github.com/yyatsyn/MassGIS-address-import).

1. Before importing MassGIS (MAD) addresses we need to make sure they are
accurate. For this I used two checks to mark MAD addresses as "suspicious"
(see the file "massgis_check_validity.py" on the github for the code):
a) a given MAD address shares a street name with at most 2 other addresses
and such a street name is not in the boundaries of a given town. Many
points that are marked as suspicious by this criterion are just the points
on the unmapped streets.
b) a given MAD address point is TOO FAR from other address points with the
same street name
The resulting "suspicious" MAD addresses are saved as shp-files in an
archive "check_massgis_validity.zip" that is uploaded to both Dropbox and
Mega. It would be great if we could go over each town and check those
"suspicious" addresses to figure out if they are worth importing or not. I
usually do it in QGIS by loading the shp-file with "suspicious" MAD points
and OSM as a background layer and then go to other sources like BING street
view/town's GIS websites to check each "suspicious" point. For keeping
track on progress of this stage I suggested to use the Google Spreadsheet,
sheet "street names mismatches", where I mark all MAD addresses' IDs (they
are also in the shp-files)
that are not worth importing + any other irregularities in the MAD data
(like street names signs for which differ from their names in MAD). My
intuition is that the MAD addresses of BC-type (assigned to building
centroids) are pretty accurate, so I would focus on checking and importing
them first, while importing PC (plot centroid) and other address points
later.

2.  After we figure out which MAD points should be excluded from the import
we can match BC-points to buildings. I've written a piece of code for that,
which would combine several stacked address points into one ";"-separated
point and would also check that no duplicates are created by the import.
For the code, please, see the file "match_mgis_addr_to_osm_buildings.py" on
github. Within next couple of days I'll do my to finish the code for this
step (namely, to convert the resulting csv-files with "OSM buildings'
full_id -> MAD address" concordances into import-ready osc/osm files).

I would appreciate your feedback, especially, suggestions on any additional
approaches for checking the quality of MAD address points.

With kind regards,
-- 
Yury Yatsynovich
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk-us-massachusetts/attachments/20190411/6bdfc2b7/attachment.html>


More information about the Talk-us-massachusetts mailing list