[Talk-us-massachusetts] Update on the MassGIS address import effort
Yury Yatsynovich
yury.yatsynovich at gmail.com
Fri Apr 12 17:08:11 UTC 2019
Thanks for the feedback, Greg!
0) You're right, I'll try to both add to wiki what has been done and
include what is mentioned on wiki in my code.
1) for Barnstable and other cities like that with a lot of villages/suburbs
-- those are really important to add as addr:suburb, otherwise there would
be duplicated addresses. MassGIS takes this issue into account and it
provides a variable COMMUNITY for such cases
2) matching street names is painful:) I've done another quick check (file
'streets_to_correct.py') in which I tried to match OSM streets to MAD
streets and listed all "close enough, but not exact" matches on the
sheet "street
names to correct" in the Google Spreadsheet (it has OSM ID of a matched
street, its OSM name and MAD name). The task is to figure out what is the
right name/what is on the street sign and then either modify OSM or add a
note to the spreadsheet, so that we'll later modify the imported MAD
addresses appropriately). I'd say, in 80% of cases MAD is correct, but
there are 20% of cases when MAD needs corrections.
3) when merging BC MAD points with OSM buildings I considered multiple
checks (see the file "match_mgis_addr_to_osm_buildings.pdf" on github for a
flowchart). All the conflicting cases of which I could think are separated
into different shp- or csv-files for manual checks. I've done a preliminary
"BC MAD to OSM-buildings match" for Plymouth -- see the file
"for_first_check_by_mappers_PLYMOUTH, PLYMOUTH.zip" on github (for
reference on the meaning of each file in the archive see the above
mentioned flowchart). This match is still work in progress, namely, it was
done without excluding erroneous MAD addresses, so, please, don't add it to
OSM! -- it is just for first checking if it makes sense/if there are
obvious flaws.
With kind regards,
On Fri, Apr 12, 2019 at 10:02 AM Greg Troxel <gdt at lexort.com> wrote:
> That's great to hear of progress.
>
> In the wiki page, I earlier added a lot of qa checks, and I don't think
> your list above includes them all. I am guessing they would almost all
> be easy, since the hard part is the database structure and environment.
>
> In order to meet the import guidelines, our text description and our
> processing have to match. I'm not saying we can't change the text, but
> I object to removing quality checks if they seem sensible.
>
> As for addresses excluded by qa, I would say that we should be looking
> into why they are off, and think about if our qa check is wrong, or if
> something is wrong in osm, or if those points are wrong in MAD. The
> process of manual inspection of things that mismatch has been really
> useful, and I think processes like those will reduce the list of
> addresses that fail qa checks. So the idea of deciding to import things
> that fail qa checks anyway does not seem right to me. (It could be that
> if a check is probabalistic, like some of the ones you describe, failing
> is ok. But I am trying to write non-probabalistic checks.)
>
> Checks I think should be added is
>
> 1) town of address matches the town that the point is in. This is I
> think particularly tricky for barnstable, where the notion of town is
> messy. It might be sensible to omit all of barnstable for the initial
> pass. We really need to avoid getting this wrong, as it's a lot of
> data. We probably need to have somebody talk to the town officials to
> really understand things there.
>
> 2) street of address matches (exactly, modulo a table of translations
> like ln/lane, and you clearly have this code already :-) a nearby
> street name that is in the *same town*. (Alan found a point on the
> stow/acton border where the address point had stow as the town and the
> street name for the road in acton, which is clearly wrong.)
>
>
> This will omit address points for roads that are not in OSM. I think
> that's good; we will then have a list of roads to add, and a rerun of
> the scripts later will then qa-pass those points. We have been finding
> issues in MAD with road name issues (e.g. "parker rd" in Stow, which
> does not actually exist, was apparently in an earlier MAD dataset and
> now not, and yesterday I found two roads spelled wrong in MAD).
>
> As for merging points with ;, I think we need to be careful to see if
> the MAD data is right in some of these cases. Maybe you have been
> looking at that, but if there are many units, we could end up with one
> point for all when there are multiple buildings. So perhaps for now
> exclude any points with more than 4 addresses. We've been talking about
> omitting the more complicated cases and starting with the cases that are
> 100% clearly correct. We can certainly import more later, and it's much
> more work to amend things. It is likely that manual review some of
> these multi-unit addresses will lead us to understand what is and isn't
> safe. Again, perhaps you already understand, but the basis for knowing
> that the import is 100% structurally correct has to be documented in the
> wiki.
>
> I think adding what I suggest will (aside from the multi-unit case)
> remove only a tiny bit of points.
>
> 2. After we figure out which MAD points should be excluded from the
> import
> we can match BC-points to buildings. I've written a piece of code for
> that,
> which would combine several stacked address points into one ";"-separated
> point and would also check that no duplicates are created by the import.
> For the code, please, see the file "match_mgis_addr_to_osm_buildings.py"
> on
> github. Within next couple of days I'll do my to finish the code for this
> step (namely, to convert the resulting csv-files with "OSM buildings'
> full_id -> MAD address" concordances into import-ready osc/osm files).
>
> There are multiple things lurking in this.
>
> One is comparing MAD addresses to existing OSM addresses. That would be
> very useful to see how the set of addresses already in OSM differ. And,
> if an address exists in OSM, the MAD point should not be imported at
> all. I think you meant that, but I think we need to be very explicit
> about that as its a bright line in import rules not to overwrite in any
> way hand-mapped data. But these excluded points are either in the same
> place (great) or info to be investigated.
>
> You say "create no duplicates". But, I think something stronger is
> appropriate: after creating an address point from MAD (presumably, all
> the addresses with a single value for coordinates), and finding a
> building that contains the point and has a centroid close to the point,
> there is the question of "does that building have an address". If so, I
> am very uncomfortable adding addresses from MAD into an address that is
> already on the building, and I think this case also needs to be diverted
> into an exception file.
>
>
> I think what I'm suggesting are very minor tweaks to what you are doing,
> code wise, and in terms of reduced import points.
>
>
> Are you thinking of trying to do Plymouth as the first case? Or some
> other town? I realize that your scripts will almost certainly output
> files per town for all towns, and then we can see what's next.
>
>
> I will have a look at your code and the wiki.
>
>
--
Yury Yatsynovich
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk-us-massachusetts/attachments/20190412/2a41f859/attachment-0001.html>
More information about the Talk-us-massachusetts
mailing list