[Talk-us-massachusetts] address import idea

Greg Troxel gdt at lexort.com
Wed Aug 15 14:12:39 UTC 2018


Jason Remillard <remillard.jason at gmail.com> writes:

> I corrected OSM in Groton and Littleton this weekend based on the last
> address conflict file Yury created for us. It was about an hour for each
> town. Alan, mentioned he spent 8 hours cleaning up Concord. That is 10
> hours for 3 towns, we have 348 towns to go, it will be over 1,000 hours of
> work if each town averages 3 hours each. It is a lot of time, we need to
> think about this!

Stow is not fixed, because there is one road with imagery, one without
imagery, and two that I don't really believe.  But I concur with the
notion that this is a big job.

Do you think that after fixing the mismatches that there is a simple
path to matching MAD to buildings and dealing with the rest of the MAD
points that don't match buildings?  I don't see that we have that worked
out.

> I suggest the following. We press ahead on the import, get the tools
> working, wiki, import list, etc. Lets call it phase 1. It will include just
> the 10-20 towns we can clean up in less than a month. At the same time, we
> can make a MapRoulette task called missing roads in MA (I would be happy to
> do this). Each unique MAD street name, will become a task.  Long term
> (phase 2), when a town is cleaned up via  MapRoulette we pull the trigger
> and do the address import.

I don't really disagree with that total list of steps, but I have one
issue and one alternative suggestion.

The issue is that we've been focusing on points that fail even a fuzzy
match.  That is a great way to find things that need thinking about and
human attention.  But it doesn't lead to believing that the rest of the
points are ok.  So I think we should be proceeding with writing down
what we're going to do (so far we've been really exploring the data,
which is great and a necessary first step), and scripts to produce not
only proposed changesets to be imported, but output for the various
other categories.  There are still a lot of things to be worked out,
like address points without a matching building, and what to do about
stacked unit points (I lean to not importing them, as they seem likely
not right and noise more than helpful).

For any category of match, we need to do QA and examine some and see if
they are really right enough to import.  I'm starting to feel that the
multi-unit big building data is not quite as good as the single house on
a L3 parcels lot data, but that's just an impression from watching the
list and a bit from the stow data.

But, the process of taking the MAD and sorting each point into a
category is going to clear up a lot of this.  As I've said before, I'd
like to see this in postgis as I think we're going to need to do some
complicated queries, like find all points that are in an address, find
all parcels that contain an address, find all address points within each
of those, find buildings within/near, and so on.  Some of these are
going to be simple (single MAD point for a coordinate pair, and it's in
an existing building), but I think there are going to be a lot of more
complicated cases.

The alternate suggestion is to not require a town to be clean before
import, but rather since we have to have the script sort points into
importable, already imported, conflicting, and not resolvable, simply
import the subset that can be cleanly imported.  That will get us a lot,
and then on rerunning, we'll have few importable and still the issues,
and cleanup can happen as it happens.  Even partial address points are
really useful as routers and humans can interpolate or at least get
close.

> It is such a big job, I don't want to see us spend a bunch of time on this
> and give up because we are trying to bite off the entire state at once.
> Doing 20 towns and getting good tooling debugged would be fantastic.

I basically agree, but look at it as 70% data for all 351 towns rather
than 100% of 20 as the first step.  But it's the same code and docs that
have to be written, and really a minor difference.  And, however we do
it, we definitely need to import a town, wait a few weeks to make sure
it causes no trouble, and then a few more, and as with buildings if
there are no issues can then speed up.   But I think addresses are at
least 10x harder than buildings, if not 20x.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 162 bytes
Desc: not available
URL: <http://lists.openstreetmap.org/pipermail/talk-us-massachusetts/attachments/20180815/25fc9d0d/attachment.sig>


More information about the Talk-us-massachusetts mailing list