[Talk-us-massachusetts] Talk-us-massachusetts Digest, Vol 31, Issue 8
Greg Troxel
gdt at lexort.com
Sat Apr 13 01:59:51 UTC 2019
Yury Yatsynovich <yury.yatsynovich at gmail.com> writes:
> One question on Phase 1 of address import, namely on NO UNITS:
> In my code I exclude units whenever possible, but I kept them if addr:unit
> is the only way to identify buildings (i.e. a group of buildings may have
> the same street name and housenumber and be differentiated only with UNIT).
I know I keep saying this, but imports are much harder than anybody who
has not been through an import thinks. People who have not yet been
through one are likely not to really believe this, so I can only ask
that you listen to experience. Perhaps Jason can comment based on the
buidlings import -- and building are very simple compared to addresses.
Therefore I lean very strongly to a simplified subset. Units are hard,
and I am pretty sure that complexity we don't fully understand them.
Phases are fairly cheap, in terms of separating review, so I would like
to see us omit all addresses with units for the first pass. After
that, I would suggest looking at what remains and identifying patterns
such that we can convince ourselves that 99.99% of addresses matching
that pattern will be correct on import, and then repeat.
I agree that if there are address points with the same number/name and
differing in unit, and there are no duplicated coordinates, and each
point lines up with a building in osm, then that's probably a good thing
to do. But that's much more complicated than points without units, and
I think the experience of dealing with simpler points will be helpful in
getting the more complicated cases correct.
> Another issue I would want to discuss is whether the tag "add:housename"
> should be part of the imported information. MAD provides some data on that,
> but usually that data includes either addr:units (B, A, Rear, etc.) or the
> name of an amenity (fire station, police dept, school, etc.) -- which
> should rather be added as amenity=* + name=*, but not the name of a
> building. Shall we separate "building names" by MAD into addr:unit and
> amenity name rather than importing them as addr:housename?
I think that for now we should just not use the MAD housename. If they
are blurring unit and other things, it seems clear that the required
quality level -- which I'm arbitrarily thinking of as 99.99% correct,
but one could argue 99.9% or 99.999% are better values -- is not met.
There is no need to use every field, and we should not try -- that has
led to messes in previous imports. The MAD is about addreses and there
are other data sets to curate other things (like police stations). For
these other amenity/names, I think a process of producing a QA output
showing differences for manual thought is probably appropriate. If we
can identify a pattern such that 99.99% of edits from that pattern are
correct, that's something else and we can discuss that specific pattern
But I think as soon as we deviate from the main path of the core
dataset, we are no longer at the required quality.
More information about the Talk-us-massachusetts
mailing list