[Talk-us] Available Building Footprints

Tue Mar 28 18:30:15 UTC 2017

On Tue, Mar 28, 2017 at 1:46 PM, Nathan Mixter <nmixter at gmail.com> wrote:
> Denis was right on with his response, and those are the type or responses
> that we need if ... and I do say if ... this project is to move forward.
> There are several hurdles in using this data, one being the size and scope.
> That is why the wiki page was created was to hash out the best way to
> proceed from people who have successfully done large scale imports. The data
> can't be reviewed effectively as one big file. The project is still young,
> and before we can even post on the imports list we need to have a procedure
> in place.
>
> Let's continue to pool resources and add ideas and suggestions to the wiki
> page as we look into the possibility of importing this data. Looking forward
> to hearing what others have to say and the ideas we can come up with
> together.

You correctly observe that the first job is to get the data split into
manageable pieces so that people can review the data quality and the
extent to which it's duplicative of what's already there. A bad import
is worse than no import, and an import that overwrites the hard work
of local mappers is many times as bad - it loses good data, brings in
bad data, and risks losing good mappers, which dries up future good
data.

I've done at most 'medium-scale' imports (a few hundred or a few
thousand multipolygons, representing boundaries of things like parks
and forests). Even with those, I found that raw shapefiles and
conversion to .OSM format was totally inadequate. The approach I took
was to pour the shapefiles into PostGIS using ogr2ogr, and then
preprocess the data there (making topology consistent, simplifying
ways, . I was then able to break things into manageable pieces,
produce individual .OSM files, and start the hard work of
conflation. For the "New York State DEC Lands", the import turned into
nearly 1500 separate changesets, and a few of those were pretty
unmanageable (Saranac Lakes Wild Forest took hours to conflate, with
all the shoreline.)

I would think that someone who is not fluent enough with some
geospatial database (PostGIS, SpatiaLite, Oracle Spatial, ArcGIS's
homegrown database, whatever...) is probably a poor choice of
individual to lead a major import.  If tasks are to be effectively
subdivided, someone needs to work out how the data are partitioned,
both from the spatial perspective (political boundaries?  census
enumeration districts? Groups of city blocks?) and the technological
perspective. (How is the split to be done? How are things patched
together at the margins? How are preëxisting data identified as
candidates for conflation? How is inconsistent topology to
be addressed?) Someone who cannot work with this sort
of file merely because of its size is likely not to have the
background to do these other tasks competently.

These tasks must be done by the project leader or a trusted delegate.
They cannot simply be farmed out to the community - or rather, farming
them out will succeed only if someone in the community steps forward
to do them, becoming de facto the project leader.

As long as the Wiki page is merely identifying this as a potential
project that someone might sign up for someday, thatś fine. As it
stands, it is incoherent as a project proposal.

(And for what it's worth, I most assuredly am not a suitable candidate.
I lack experience with coordinating or performing a large-scale import.
While I like to flatter myself that the imports I've done were done
with at least minimal competence, I'm surely not qualified to jump
to one this huge.)