[Talk-us-massachusetts] Update on the MassGIS address import effort

Greg Troxel gdt at lexort.com
Fri Apr 12 14:02:54 UTC 2019


That's great to hear of progress.

In the wiki page, I earlier added a lot of qa checks, and I don't think
your list above includes them all.  I am guessing they would almost all
be easy, since the hard part is the database structure and environment.

In order to meet the import guidelines, our text description and our
processing have to match.  I'm not saying we can't change the text, but
I object to removing quality checks if they seem sensible.

As for addresses excluded by qa, I would say that we should be looking
into why they are off, and think about if our qa check is wrong, or if
something is wrong in osm, or if those points are wrong in MAD.  The
process of manual inspection of things that mismatch has been really
useful, and I think processes like those will reduce the list of
addresses that fail qa checks.  So the idea of deciding to import things
that fail qa checks anyway does not seem right to me.  (It could be that
if a check is probabalistic, like some of the ones you describe, failing
is ok.  But I am trying to write non-probabalistic checks.)

Checks I think should be added is

  1) town of address matches the town that the point is in.  This is I
  think particularly tricky for barnstable, where the notion of town is
  messy.  It might be sensible to omit all of barnstable for the initial
  pass.  We really need to avoid getting this wrong, as it's a lot of
  data.  We probably need to have somebody talk to the town officials to
  really understand things there.

  2) street of address matches (exactly, modulo a table of translations
  like ln/lane, and you clearly have this code already :-) a nearby
  street name that is in the *same town*.  (Alan found a point on the
  stow/acton border where the address point had stow as the town and the
  street name for the road in acton, which is clearly wrong.)


This will omit address points for roads that are not in OSM.  I think
that's good; we will then have a list of roads to add, and a rerun of
the scripts later will then qa-pass those points.  We have been finding
issues in MAD with road name issues (e.g. "parker rd" in Stow, which
does not actually exist, was apparently in an earlier MAD dataset and
now not, and yesterday I found two roads spelled wrong in MAD).

As for merging points with ;, I think we need to be careful to see if
the MAD data is right in some of these cases.  Maybe you have been
looking at that, but if there are many units, we could end up with one
point for all when there are multiple buildings.  So perhaps for now
exclude any points with more than 4 addresses.  We've been talking about
omitting the more complicated cases and starting with the cases that are
100% clearly correct.  We can certainly import more later, and it's much
more work to amend things.  It is likely that manual review some of
these multi-unit addresses will lead us to understand what is and isn't
safe.  Again, perhaps you already understand, but the basis for knowing
that the import is 100% structurally correct has to be documented in the
wiki.

I think adding what I suggest will (aside from the multi-unit case)
remove only a tiny bit of points.

  2.  After we figure out which MAD points should be excluded from the import
  we can match BC-points to buildings. I've written a piece of code for that,
  which would combine several stacked address points into one ";"-separated
  point and would also check that no duplicates are created by the import.
  For the code, please, see the file "match_mgis_addr_to_osm_buildings.py" on
  github. Within next couple of days I'll do my to finish the code for this
  step (namely, to convert the resulting csv-files with "OSM buildings'
  full_id -> MAD address" concordances into import-ready osc/osm files).

There are multiple things lurking in this.

One is comparing MAD addresses to existing OSM addresses.  That would be
very useful to see how the set of addresses already in OSM differ.  And,
if an address exists in OSM, the MAD point should not be imported at
all.  I think you meant that, but I think we need to be very explicit
about that as its a bright line in import rules not to overwrite in any
way hand-mapped data.  But these excluded points are either in the same
place (great) or info to be investigated.

You say "create no duplicates".  But, I think something stronger is
appropriate: after creating an address point from MAD (presumably, all
the addresses with a single value for coordinates), and finding a
building that contains the point and has a centroid close to the point,
there is the question of "does that building have an address".  If so, I
am very uncomfortable adding addresses from MAD into an address that is
already on the building, and I think this case also needs to be diverted
into an exception file.


I think what I'm suggesting are very minor tweaks to what you are doing,
code wise, and in terms of reduced import points.


Are you thinking of trying to do Plymouth as the first case?   Or some
other town?  I realize that your scripts will almost certainly output
files per town for all towns, and then we can see what's next.


I will have a look at your code and the wiki.




More information about the Talk-us-massachusetts mailing list