[Imports] Potential data source: New York City watershed recreation lands

Kevin Kenny kkenny2 at nycap.rr.com
Mon May 23 03:35:09 UTC 2016


One-line summary: I want to import the boundaries of New York City
watershed recreation areas.

Side note: This project ties in closely with Paul Norman's
identification of a need to clean up the NYS DEC Lands import.
Many of the NYC DEP watershed lands share borders with the DEC lands,
and performing this import together with or after the DEC Lands
cleanup would yield a topology that is more nearly consistent. (Some
property lines simply are mapped inconsistently in the real world as
well as the digital world, so there will unavoidably be misalignment
of some parcels after the import.)

I welcome comments about any aspects of this proposal. I'm still new
to this game.

------------------------------------------------------------------------

PROPOSED IMPORT:
     New York City
     Department of Environmental Protection
     Bureau of Water Supply
     Open Recreation Areas and Use Designations
     http://www.nyc.gov/html/dep/pdf/recreation/open_rec_areas.pdf

1. OVERVIEW

New York City owns, and makes accessible for public recreational use
(activities such as hiking, fishing, hunting and trapping) about four
hundred parcels of land in the Catskill and Croton watersheds. All of
these lands are outside the boundaries of the city itself. The vast
majority of these parcels do not yet appear in OpenStreetMap.  This
proposal is made to solicit community buy-in for the project of
importing multipolygons giving the boundaries of these reserves.

I expect that this import should be relatively non-controversial. The
data arise from an authoritative source - the agency that manages the
lands in question. They are readily obtainable in no other
way. Cadastre of public parks, nature reserves, and the like has been
imported many times before.

The import is of relatively small scale, comprising fewer than 400
multipolygons and associated tags. The total area of the parcels in
question is roughly 145 square miles (375 km**2).

2. LICENSING

I believe that the data are, by law, in the public domain under New
York City's open data access policy. The OSM community has relied on
ths policy in the past, most notably in the import of the New York
City address and building footprint data. The relevant paragraph is
in the Administrative Code of the City of New York, Chapter 5,
paragraph 23-502, subparagraph d. The text may be found at
http://www1.nyc.gov/assets/doitt/downloads/pdf/nyc_open_data_tsm.pdf,
page 27. The data in question do appear on the single web portal
described in subparagraph a.

3. TECHNICAL DETAILS

The data in question consist of the PDF file
http://www.nyc.gov/html/dep/pdf/recreation/open_rec_areas.pdf, and the
PDF maps to which it links. I've successfully made a script to scrape
the tabular data from the PDF, resulting in a set of 367 distinct
unit names, together with the 'paa', 'hike', 'fish', 'hunt', 'trap'
and 'dua' columns, and the URL's of the corresponding maps.

These maps are all in PDF format. They are fully georeferenced, and
I've been able to work out GDAL scripts to extract the boundaries and
produce well-formed polygons from all but four of them. These four are
the "Day Use Areas" or "Designated Use Areas" (the web site fairly
consistently uses the former phrasing, the posters on the land use the
latter) of Devasego Park, the Ashokan fountains, and the Kensico and
Cross River dams. These are popular areas for walking and picnicking,
but are more of the nature of city parks than of nature reserves.
On the initial import I propose simply to ignore these four, leaving
363 recreation areas to import.

The proposed tagging is as follows:
     leisure=nature_reserve
         For the benefit of legacy renderers that do not yet comprehend
         the details of boundary=protected_area
     boundary=protected_area
     protect_class=12
     protection_object=water
         Tailor-made for this data set!
     operator='New York City, Department of Environmental Protection,
               Bureau of Water Supply'
     website=http://www.nyc.gov/html/dep/html/recreation/index.shtml
     name=(obtained from the 'unit' column of the list of sites, with
             the word, 'Unit' postpended)
     access=yes (if the 'PAA' column is 'Y') or access=license (if the
             PAA column is 'N')
access:license=http://www.nyc.gov/html/dep/html/watershed_protection/recreation.shtml
             if (access=license)
     access:hiking=(value of the 'hike' column, normalized to 'yes' or 'no')
     access:fishing=(value of the 'hike' column, normalized to 'yes' or 
'no')
     access:hunting=(value of the 'hike' column, normalized to 'yes' or 
'no')
     access:trapping=(value of the 'trap' column, normalized to 'yes'
             or 'no')
     nycdep:version=YYYYMMDDHHMMSS
         UTC time returned as Date-Modified from the web site. See
         below for rationale of retaining this information.

I'm more than open to a different tagging scheme for 'access'. What
the relevant restrictions are:

PAA=Y areas are open to all comers, no permission needed, for the
activities specitied. PAA=N areas require a free access permit
obtainable at the web site
http://www.nyc.gov/html/dep/html/watershed_protection/recreation.shtml

HIKE, FISH, HUNT, and TRAP describe the permitted activities (HIKE
encompasses related activities such as photography, bird watching,
etc.)

The areas in which HIKE=N are all areas adjoining the
reservoirs. Hiking with no other purpose is forbidden in these areas,
as is the trapping of game. Hunters, fishermen and boaters accessing
these areas must have valid licenses for these activities, and boats
must be tagged by NYCDEP. Since all of the HIKE=N areas are also
PAA=N, lawful users will have applied for an access permit and been
presented with the restrictions, so I don't propose to model this
complexity in the tagging, unless someone suggests a more obvious
tagging scheme than I've been able to invent.

CONFLATION AND UPDATE PLAN

The initial conflation should be quite straightforward - simply query
a PostGIS mirror for area features that overlap the supplied
multipolygons by more than a trivial amount. (The cadastral data from
the different agencies are not 100% consistent, so I expect that a few
per cent of some parcels will overlap adjacent state forests, and
intend to import these data as is. Rectifying misdrawn property lines
is not our problem!) I propose simply to import the parcels into JOSM,
resolve any JOSM-reported errors and warnings, and upload. I will
likely work either by county or by township, depending on the number
of parcels in a county, to keep each upload to a manageable size.

Further updates in semi-automatic fashion should also be fairly
straightforward. I propose to maintain a record of what has been
uploaded, and when changes appear, check whether the OSM data for a
parcel have changed from the previous upload. For unchanged parcels,
the old can be replaced with the new withough stepping on any mapper's
manual work, For new parcels, the upload can proceed. For changed
parcels, the change has to be alerted for manual review. I expect that
this last situation will be vanishingly rare. Of course, if the new
upload results in a conflict (e.g., a substantial overlap with an area
feature already in the database), the change will have to be flagged
for manual review.

FURTHER NOTES

I'd much rather work from the bureau's own shapefiles, of course, but
I've not yet managed to locate an appropriate contact to request
them. Filing a demand under the Freedom of Information Law is often
regarded as a hostile act, and I'd rather stay on good terms with the
officials involved, so I prefer to proceed by less formal means. The
'web scraping' outlined above at least works, although I expect that
it will be a brittle process in the long run. I'll keep casting about
for a more robust way to handle this data set.

NEXT STEPS

Of course, I'll make source code of all scripts available for review
and so that I can pass the baton for others to carry out the
semiautomated update process if needed.

If this proposal doesn't get roundly shot down, the next steps will be
to create a project page on the wiki, link it to Import/Catalogue,
clean up and publish the scripts, perform the import onto the test
server, get a data review, and then update the Contributors page and
do the import for real.

Comments?

-- 
73 de ke9tv/2, Kevin




More information about the Imports mailing list