[Talk-us-massachusetts] Some observations and preliminary stats on MassGIS address import

Peter Dobratz peter at dobratz.us
Wed Jul 18 15:14:36 UTC 2018


One uesful tool is the FixAddresses JOSM plugin:
https://wiki.openstreetmap.org/wiki/JOSM/Plugins/FixAddresses

This plugin finds inconsistencies between values in addr:street of an
address and nearby name tags on roads.  You can find problems with
capitalization as well as things like inconsistencies in prefixes and
suffixes. Also, you can sometimes find missing roads altogether.  Either
way, the FixAddresses plugin is a great way to identify potential problems
with data being imported and/or existing OSM data.  If you identify a value
for addr:street that you want to change all at once, the JOSM search
function is fairly advanced so you can search for something like
"addr:street"="Mccabe Street" and then change them all at once.  Then run
the FixAddress plugin again and see them drop off the list of potential bad
addresses.

I'm not sure if any towns in Massachusetts are doing this, but within the
last decade or so some New Hampshire towns have been renaming roads to
reduce confusion.  Roads have been renamed to avoid similarly sounding
names or roads with disjoint sections.  If anything similar is happening in
Mass then this import might be uesful in finding and fixing those name
mismatches.

Peter

On Wed, Jul 18, 2018 at 4:08 AM Angela Morley <amorley at protonmail.com>
wrote:

> The double capital problem is unique. If we can find a master list of
> streets with double capitals in MA, we can apply it, but when our own
> government is releasing data in ALL CAPS it makes that difficult. I'd love
> to find a way to do it, but if we can't, I think the error is acceptable
> considering the massive benefit the rest of the import would have.
>
> In the case of apartment buildings where multiple addresses as POI sit on
> top of a building polygon, I believe Conflation iterates through linearly.
> User would have to choose to merge these values manually, as JOSM would
> apply first point, then try to apply second and would bring up a conflict
> dialog.
>
> I can double check this behavior a bit later.
>
> If there a better conflation tool we can use, I'd be up for it. This is
> just a process I discovered without using programming, which I'm not
> skilled at.
>
>
> Sent from ProtonMail mobile
>
>
>
> -------- Original Message --------
> On Jul 17, 2018, 10:18 PM, Yury Yatsynovich < yury.yatsynovich at gmail.com>
> wrote:
>
>
> Couple of comments:
> 1) instead of converting all street names with the title() function maybe
> we can look at actual names of streets in MA and identify those that
> require double capitalization
> 2) how does the conflation plugin work if there are several address points
> within one building? Does it create something like
> addr:housenumber=13,15,17 or does it just skip such many-to-1 matches?
>
> On Tue, Jul 17, 2018 at 9:53 PM Angela Morley <amorley at protonmail.com>
> wrote:
>
>> Some edits since I sent this, including a python script to automate data
>> cleaning before import and make GQIS unnecessary. See current revision at
>> https://wiki.openstreetmap.org/wiki/Import/Catalogue/MassGIS_Addresses
>>
>>
>>
>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>> On July 17, 2018 8:23 PM, Angela Morley <amorley at protonmail.com> wrote:
>>
>> Alright all,
>>
>> Here's the import process I came up with. Your criticism, or following
>> along at home, would be welcome. Try it out, see if there's issues.
>>
>>
>> *Import process*
>> The goal of this process is to conflate/merge POI address data with
>> building polygons within the import region. A side effect of this process
>> is that it breaks double-capitalized streetnames, like McCabe Street, into
>> Mccabe Street. I'm not sure there's a better way around these outliers,
>> considering the source data is ALL IN CAPS. It should be noted that
>> Massachusetts data is separated by town, and this process would have to be
>> repeated about 350 times.
>>
>> Source data:
>> https://docs.digital.mass.gov/dataset/massgis-data-master-address-data-basic-address-points,
>> select a town from the map, download Point Data. Saves as a .zip file.
>> Extract zip file to a folder before continuing
>>
>> *Changing STREETNAME to first-letter-capitalized form*
>> 1. Open QGIS, load .shp Shapefile for town into layers You can do so by
>> navigating to the .shp file within QGIS's browser section
>> 2. Right click the AddressPoints_M* file, and Open Attribute Table
>> 3. Open Field Calculator.
>> 4. Check Update Existing Field, and select STREETNAME
>> 5. In the Expression editor below, copy and paste in the phrase
>> "title(STREETNAME)" Click OK, and wait for the data to process.
>> 6. Exit Attribute Table, saving any changes
>> 7. Right click layer name, Save As. Set format as ESRI Shapefile, and
>> pick a path and name to export modified case shapefile
>> Note: I had issues with the actual XY point values being lost if I saved
>> the file as a name different than the original and then opened it with
>> JOSM. I don't exactly know how to reproduce this error. Let me know if you
>> discover it.
>>
>> *Import into JOSM*
>> Requires OpenData and Conflation plugins
>> 1. Download area for town to be modified (NOTE: I don't know how to
>> download large areas, as JOSM won't let you do so. Can someone please tell
>> me how you'd download town-sized maps into JOSM?)
>> 2. Open the modified .shp shapefile from before into JOSM. You should now
>> have two layers.
>> 3. Select the shapefile layer into view, and under objects, edit the name
>> of the field "ADDR_NUM" to be "addr:housenumber" and STREETNAME to be
>> "addr:street" (Don't edit the values, just the labels of the values.
>> 4. Select the "Data Layer 1" layer into view, and click Search under
>> Selection window.Start a search to select all features with building=yes as
>> its tag.
>> 5. Select the .shp shapefile into view, do a Select All. Grab a coffee.
>> 6. Open the Conflation window with the button on the left. Click
>> Configure within the Conflation window. Select into view the Data Layer 1,
>> and click "Freeze" next to Subject within Conflation's popup. Select into
>> view the shapefile and click "Freeze" next to Reference
>> 7. In the conflation settings, keep simple configuration, ensure you're
>> using Disambiguating, Centroid of <10-20, and set Tags to
>> addr:housenumber;addr:street. Under the merging area, uncheck Replace
>> Geometry and All, leave Merge Tags checked. The box directly to the right
>> of All, add addr:housenumber;addr:street
>> 8. After some processing, you will see cyan arrows on the screen to
>> indicate correlations between the target and reference data layers, and
>> which points will be applied to which buildings. You can review these if
>> you want. Also, in the Conflation docked window, there's a list of nodes.
>> Distance indicates how far away the node was from the building, score is
>> the percentile of the possible match, and No Conflicts appears if there's
>> no conflicts found. If a conflict is found, it brings up a window that asks
>> you what to do.
>>
>>
>>
>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>> On July 17, 2018 12:57 PM, Yury Yatsynovich <yury.yatsynovich at gmail.com>
>> wrote:
>>
>> Thanks for creating the wiki page, Angela!
>>
>> I did some simple spatial joins of MassGIS address points with existing
>> buildings without addresses (either house number or street name missing)
>> using python (geopandas and osmnx). Here are some stats from such joins by
>> counties:
>>
>> County    OSM buildings w/o addresses    MassGIS points    Matched
>> points    1-to-1 matches    Share of matched points    Share of 1-to-1
>> matches
>> Barnstable    188370    192619    157235    121330    0.82    0.77
>> Berkshire    89520    79138    62324    39290    0.79    0.63
>> Bristol    233590    284883    247625    112776    0.87    0.46
>> Dukes    23647    20330    13639    11236    0.67    0.82
>> Essex    268122    379036    328679    147073    0.87    0.45
>> Franklin    50091    40853    31336    18146    0.77    0.58
>> Hampden    206111    225352    198221    100958    0.88    0.51
>> Hampshire    78709    73675    60417    34122    0.82    0.56
>> Middlesex    429556    779421    594647    219696    0.76    0.37
>> Nantucket    14048    12963    8408    6445    0.65    0.77
>> Norfolk    361719    337260    302017    147594    0.90    0.49
>> Plymouth    233374    244200    207421    134116    0.85    0.65
>> Suffolk    118763    457611    426855    36796    0.93    0.09
>> Worcester    383670    407001    347611    173147    0.85    0.50
>> TOTAL    2679290    3534342    2986435    1302725    0.84    0.44
>>
>> Most of the address points (84%) lie within boundaries of buildings w/o
>> addresses, almost half of which (44%) are unique address point within the
>> corresponding buildings.
>>
>> For those 1-to-1 matches the address info can be added directly to the
>> buildings. For the many-to-1 matches (several addresses within one
>> building) the options are either creating separate address points within a
>> building or combining addresses of all such points and adding it to the
>> building (e.g. addr:housenumber = 11,13,15 as suggested in
>> https://wiki.openstreetmap.org/wiki/Addresses#Buildings_with_multiple_house_numbers).
>> It seems that the second approach has been used for the existing addresses
>> in Malden and Boston. If points in the many-to-1 matches have different
>> street names then adding separate address points seems to me the only
>> solution.
>>
>> Having visualized the data in QGIS I've noticed couple of issues:
>> 1) some points lie outside, yet, very close to a building so that it is
>> unambiguous to which building they belong. Maybe creating a small buffer
>> (5-10m) around such points and merging them with unique buildings that
>> these buffers intersect can help match them.
>>
>> 2) some points, as it was mentioned in previous messages, are assigned to
>> parcels. E.g. there are many cases when a group of buildings (a house and,
>> say, barns)  have an address point next to them. It could be possible to
>> identify a house among those building manually and add the address only to
>> it, yet, as there might be several thousands of such cases, it can be very
>> time consuming. So, for parcels I would suggest simply adding an address
>> point in the middle of a parcel (as it is placed in MassGIS) without
>> identifying the exact building to which the address belongs.
>>
>> --
>> Yury Yatsynovich
>>
>>
>>
>>
>
> --
> Yury Yatsynovich
> _______________________________________________
> Talk-us-massachusetts mailing list
> Talk-us-massachusetts at openstreetmap.org
> https://lists.openstreetmap.org/listinfo/talk-us-massachusetts
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk-us-massachusetts/attachments/20180718/9c2ba609/attachment-0001.html>


More information about the Talk-us-massachusetts mailing list