[Talk-us-massachusetts] Massachusetts Address Imports - Latest Updates and Request for Comment
Greg Troxel
gdt at lexort.com
Tue Jan 8 02:17:52 UTC 2019
Angela Morley <amorley at protonmail.com> writes:
> Alan, Yury, and myself have been working hard to address the latest
> commentaries and ideas that came out of our previous emails regarding
> importing addresses. We've come up with a solid framework for
> importing and handling the data, and I wanted to bounce this off the
> MA talk list for commentary and input before we go further forward
> with it.
Sorry for taking so long to respond - I have been distracted with things
other than mapping. I'm tryng to get useful comments out fast so am a
bit terse.
I see that the goals section is at least mostly as I wrote it (I'm fuzzy
on details). It sounds then like there is agreement on those, but it's
hard to map goals and process.
* presence on imports@ list
I think that the people proposing the import should be on the imports@
list for several months before proposing, reading the comments about
other imports. They should also be on talk-us@ (the "local list" that I
think is required for this) before proposing, but I don't think a long
presence is necesssary there.
* phases
I'm a big fan of phases, but it would be good to adjust wording to make
it clear that approval is sought for phase 1 only (phase 0 not being an
actual import, with the expectation that phase 2 will be defined and
re-emailed to imports@ before it happens). I think it's reasonable for
everyone to expect a smoother review on subsequent phases.
* data QA
Part of an import is or should be an explanation of why a skeptical
party should believe that the data that will be imported is of very high
quality. I know there has been a lot of looking at various parts of
MAD, but that doesn't come across in this page.
Part of what makes this hard is that the MAD is a complicated dataset
with multiple kinds of information. The plan is to subset it for the
first phase, but it's a little hard to tell exactly how. Defining the
subset is key for QA.
We've found a number of issues, and it seems our plan is to carve those
out of the first phase. I of course think that's great :-) But I would
then define the carveouts, and explain why someone should believe that
what remains (or is selected) is 100% right.
* mapping from MAD schema to OSM
I would maybe put this earlier, with a caution that not everything will
be imported.
If I were someone who generally didn't like imports, I'd ask questions
like "how many of the various codes are there", especially about
MODIFIED and LINKED. This is confusing becuase it seems that a MODIFIED
point should also be ACTIVE. Presumably that is understood and should
be explained.
* Phase 1 contents
I am having a little trouble following. Some of these things are about
selecting data from the MAD, and some are about OSM.
We have talked about a bunch of this and I'll try to write something
that's sort of English but could easily be implemented.
There are a number of checks that need to be done on each MAD point
(MADP) that I think we had consensus about on the list, but I don't
really see how it works:
town name in the MADP matches enclosing admin8 polygon
street name in the MADP matches (exactly) the name of a nearby street
that is also in the same admin8 polygon. (This is complicated by
Route 6 which isn't really a name and should not even be in alt_name
but could be defined to match for addressing purposes.) Note that we
found an address on the Stow/Acton line which is wrong in MAD and was
confused about which road is in which town.
housenumber is not 0
MADP point of type BC (all of them in phase 1) is within a building=yes
polygon (or near the centroid of one?), and all of MADP points that
match that building are somehow consistent. This point needs
addressing somehow.
** exclusions
There seems to be notion of not importing units. I am 99% for this
position. But, I think it has to be "all MADP points with a unit are
ignored".
I am not in favor of importing address ranges, or anything other than
MADP points with single addresses. I suspect there aren't that many,
and I am very skeptical about quality.
It seems ok to take multiple MADP points that match a building and have
the same coordinates (or maybe just close) and if they match in all
manners other than addr:housenumber, to make the result a semicolon
separated list.
I think you mean "if community name is barnstable, we're going to ignore
this point".
* data transformation
There is no section that says "here is how we take a MADP and turn it
into OSM tags".
** odd numbers
Regarding this, there is a notion of "1R" and "1 1/2" and there is a
notion of consensus to just do that. Probably there is existing OSM
practice on this, in tagging wiki pages, or in actual tags.
There is an obvious question, about excluding housenumbers that don't
match some grammar that we define. I think it's (for N a number and L a
capital letter)
NNNNN
NNNNN L
NNNNN 1/2
and maybe that's it.
** city/state
Generally, when handmapping, we have been putting in city and state
(city in OSM terms means admin8 in MA, excluding odd places like Barnstable).
* sample data
The imports world wants to be able to download
the source data
data that has been transformed
and I don't really see this.
More information about the Talk-us-massachusetts
mailing list