[Talk-us-massachusetts] A simple check for addresses before the import
Jason Remillard
remillard.jason at gmail.com
Tue Aug 7 01:48:34 UTC 2018
Hi Yury,
The shapefiles are fantastic. There are a lot of differences!! However 90%
of them trivial whitespace differences or difference in road or street.
However, the other 10% are very interesting. There are roads missing from
OSM, roads names that are just different, and obvious spelling errors in
both MassGIS data and in OSM.
I think it makes sense to fix the severe errors first. Could you make a new
set of shape files, that removes all white space, road, street, lane, way,
avenue, etc in the road name compare so that we can focus on the really
large discrepancies?
We can use the Bing street view, mapallary, or open street cam, to
hopefully get a view of a road sign, worse case jump in the car and take a
look. We can fix anything that looks wrong in OSM, and send the MassGIS
mistakes back to them (with images of the road signs).
For roads that are different just by white space, I would run a spell
checker over the words, and if the OSM street name has a word that isn't an
actual word, then put it into a shapefile so we can investigate. If is
pretty easy to get a spell checker in Python if you don't need to suggest
corrections, just load a word list into a hash.
Over the next couple of weeks we can at least get the bad errors in OSM
straightened out.
MassGIS uses this data for 911, so they might fix errors that are reported.
But If they don't fix them before we are ready for the import, or don't
want to accept feedback, we should skip addresses that don't closely match
an OSM road name, and hopefully, we will get them the next time we re-run
the import.
I am sure MassGIS doesn't want 100 emails from us, so perhaps we could
setup some kind of shared document that we can all edit and add in
addresses that we think are wrong.
Also, I noticed the MassGIS data has building names, that we should try to
import too.
Jason
On Mon, Aug 6, 2018 at 9:29 AM, Yury Yatsynovich <yury.yatsynovich at gmail.com
> wrote:
> Greetings!
>
> I've recently written a simple code (see lines 107-202 in
> https://github.com/yyatsyn/MassGIS-address-import/blob/maste
> r/import_addresses_work_in_progress.py) that looks for nearest 7 streets
> for each address point (or each building with address information) and
> marks this point/building as problematic if neither of names of the 7
> streets match the addr:street tag value for the point/building.
> I've done this check for points/buildings that are already in OSM as well
> as those that are in MassGIS database of addresses.
>
> The resulting shape files are stored in https://mega.nz/#F!75M1CAAJ
> !8r63YpTy3HIACDcAUO4c2g (make sure you download all files with the same
> names to be able to open the corresponding .shp-file):
> -- problem_pnt_addr.shp and problem_bld_addr.shp -- have points/building
> that are already in OSM
> -- *COUNTY*_problem_mgis.shp -- have points from MassGIS (split by
> counties).
>
> Most of problems with MassGIS are from relatively small mismatches in
> street names (e.g. MassGIS has addresses with "MEDOUIE CREEK ROAD", while
> in OSM it is just "MEDOUIE CREEK" or "HELLER WAY" vs "HELLERS WAY" or
> "TENNESSEE AVENUE" vs "TENNESSE AVENUE").
>
> I guess, I may also add some fuzzy matching mechanism to the code (so that
> "TENNESSEE AVENUE" and "TENNESSE AVENUE" would be considered the same) in
> order to separate those MassGIS addresses that are definitely located in
> the wrong places (those MassGIS points for which addr:street is not even
> somewhat similar to the names of nearby OSM streets) from points that are
> next to a street with a mis-spelled name.
>
> If there are mismatches in names of streets in OSM and MassGIS, how do we
> figure out which source is right?
>
> As far as I know, some OSM apps (MAPS.ME, 7 ways) need addr:street and
> name of the highway to match exactly in order to convert and properly
> search over the address data. So, before we continue with importing, shall
> we correct all mismatches in the existing points/buildings with addr:street
> and misspelled streets?
>
> Best,
> --
> Yury Yatsynovich
>
> _______________________________________________
> Talk-us-massachusetts mailing list
> Talk-us-massachusetts at openstreetmap.org
> https://lists.openstreetmap.org/listinfo/talk-us-massachusetts
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk-us-massachusetts/attachments/20180806/277bf210/attachment.html>
More information about the Talk-us-massachusetts
mailing list