[Talk-us-massachusetts] MassGIS address import workflow

Greg Troxel gdt at lexort.com
Thu Aug 2 11:15:15 UTC 2018


Yury Yatsynovich <yury.yatsynovich at gmail.com> writes:

> Greetings!
> I agree with what Jason said.
> These days in my free time I've been working on a script that would do all
> those above described steps without manual intervention so that it can be
> run repetitively (e.g., in a year or so when MassGIS updates its database).
> Currently the script
> 1) converts MassGIS gdb-file into a shp-file.

Do you have a gdb of the full MAD?  Where did you get that?  I can only
see xlsx of the complete address points on the website, and only
downloaded by town  manually with clicks.

> 2) downloads the entire OSM data for MA from Geofabrik and selects from it
> the existing buildings with and without addresses as well as points with
> both addr:housenumber and addr:street.
>
> I guess, the next step will be to filter the networks of streets from the
> master MA file and check if the already existing and imported addresses
> actually have streets with the corresponding names in their vicinity (a
> buffer of, say, 100 m).

I think Jason's suggestion of essentially processing each massgis
address and sorting it into various categories and outputting them
separately is excellent.  That will let us deal with exceptions, and
understand how good a match things are.

And, this lets us deal wtih the various categories of massgis addrs and
decide to skip some this time, or treat them differently.

Also, I think we should be matching the street names in massgis with
street names in OSM (bounded in distance so we get the right local one),
building up a table to convert the UPPER to Initial or McMixed case to
match.  That will also generate a list of exceptions where street names
in OSM do not match the massgis MAD, and we can see how big that list is
and what we should do about it.

I strongly suggest that you do this using postgis, which allows indices
for quickly finding nearby things and buffer operations, etc.  You can
do things like SELECT all the coordinates for addresses with UNIQUE, and
then iterate over them getting all address points for each coordinate.
With a full db, it's easy to add a table to note that an id has been
processed to control the iteration, but really my point is that this is
going to get 10x messier than anticipated before it's done so it's good
to plan for a larger system than it seems (learned from experience being
on the imports list for years and being along for the ride with the
buildings import, which was logically 10x simpler).

I think it will also be good to be able to have the multiple output
files suggested by Jaso (addresses in OSM that match, addresses in OSM
that don't match, addresses to be added to buildings, perhaps addresses
to be added as points, etc.) split out by town, so that local people can
review their area more easily.  That worked out really well for
buildings.

As you pointed out, it is good to think ahead to a notion of rerunning
the import script every 6 months, which will hopefully generate a small
list of things to add, and a report that most things match.  That may
lead to a desire to rerun the buildings import, and to crosscheck the
current massgis street data against mass osm datea.  So really there's
infinite work to do :-)

> Jason, I'd be grateful if you could share the scripts that were used for
> importing MA buildings into OSM.

They are all linked from here:

  https://wiki.openstreetmap.org/wiki/MassGIS_Buildings_Import

Do you have a plan of how to manage the source code while we work on it?
I'm not a raging fan of github, but that or something like that that
allows others to get the code and make changes, staying within the
revision control tool, will be really helpful.  Which is a long way of
saying github or gitlab, or self-hosted gitlab, works.  I will be able
to help with this.



More information about the Talk-us-massachusetts mailing list