<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<body>
<div dir="auto">
<div dir="auto">Thank you so much for your reply! That's exactly the kind of insight I was hoping for by posting here.</div>
<div dir="auto"><br></div>
<div dir="auto">On July 16, 2020 12:16:19 Kevin Kenny <kevin.b.kenny@gmail.com> wrote:</div>
<blockquote type="cite" class="gmail_quote" style="margin: 0 0 0 0.75ex; border-left: 1px solid #0099CC; padding-left: 0.75ex;">
<div dir="auto"><br></div>
<div dir="auto">I'm less sanguine than Skyler is about the data quality. I suspect</div>
<div dir="auto">s/he (the given name doesn't clearly identify a preferred pronoun) has</div>
<div dir="auto">been looking at urban or suburban areas in counties whose GIS</div>
<div dir="auto">departments have relatively stable funding. In those situations, yes,</div>
<div dir="auto">the data are fairly good. There is still a serious conflation issue</div>
<div dir="auto">that isn't addressed, with respect to buildings whose footprints are</div>
<div dir="auto">already mapped but do not bear addresses, where the address point may</div>
<div dir="auto">or may not be in the building footprint. Many address points, too,</div>
<div dir="auto">get clustered at the entrance of a private or shared driveway, rather</div>
<div dir="auto">than being on the indivdual dwellings. I seem to recall that at least</div>
<div dir="auto">one or two of the apartment and townhouse complexes in the general</div>
<div dir="auto">area of <a href="https://www.openstreetmap.org/#map=18/42.83211/-73.89931">https://www.openstreetmap.org/#map=18/42.83211/-73.89931</a> had</div>
<div dir="auto">to have their house numbers collected on foot, because the E911 data</div>
<div dir="auto">showed all the address points in a single cluster.</div>
<div dir="auto"><br></div>
<div dir="auto">In the rural areas, particularly in the counties with tiny</div>
<div dir="auto">populations, the situation is grimmer. I'm not certain that Schuyler</div>
<div dir="auto">or Wyoming Counties even would _have_ dedicated GIS departments!</div>
<div dir="auto">Until relatively recently, when grant money was available to have this</div>
<div dir="auto">information in GIS systems for E911 use, they mostly were still using</div>
<div dir="auto">paper maps, often referenced to an unknown datum. (The first job in</div>
<div dir="auto">dealing with any scanned tax plat is figuring out what coordinate</div>
<div dir="auto">frame it's using - around here, NAD27 differs from NAD83 by a few tens</div>
<div dir="auto">of metres.) The address points may be parcel centroids, or building</div>
<div dir="auto">centroids, or the point where the driveway meets the road, or even</div>
<div dir="auto">just something that was digitized from a pencil sketch made by an</div>
<div dir="auto">assessor. Import of this sort of data could well prove to be a</div>
<div dir="auto">short-term gain but impose a heavy long-term burden; consider the</div>
<div dir="auto">love-hate relationship that we all have with TIGER. (The import means</div>
<div dir="auto">that we've got a nearly-filled-in map, a lot of which is of</div>
<div dir="auto">halfway-decent quality, and we don't have the mappers to have done it</div>
<div dir="auto">nearly as quickly any other way. Nevertheless, for some years we've</div>
<div dir="auto">been paying the price in bad data and worse conflation.)</div><div dir="auto"><br></div><div dir="auto">So, my advice for both legal and technical reasons would be to use</div><div dir="auto">caution, and recognize that mechanical import is likely to be a</div><div dir="auto">disaster - the data will need to be eyeballed by human beings and</div><div dir="auto">corrected.</div>
</blockquote>
<div dir="auto"><br></div>
<div dir="auto">I certainly did not do an extensive check of the quality, so this is a super useful perspective. (I wanted more clarity on the legal aspect before investing more time in that, since, after all, if it's a definite no go from a legal perspective, why waste any time at all?) It's unfortunate that there's such a big variation in quality, although not unexpected, since they come from the counties themselves.</div>
<div dir="auto"><br></div>
<div dir="auto">However, at least the examples you gave would not necessarily make me consider the data unusable without extensive correction. The way I look at this is: if the point is close enough that were a person to stand right at the exact spot, could they find the place they are looking for? If the answer is yes for the vast majority of the data, then I would call that a net gain for OSM.</div><div dir="auto"><br></div><div dir="auto">Furthermore, if the data were never manually reviewed and corrected, would it still be valuable enough to import? You obviously have extensive experience with this data set, so I would trust your judgment on this, but if the worst problems we see are mostly the ones you described, it would sound to me like the pros outweigh the cons, even if the points were never corrected.</div><div dir="auto"><br></div><div dir="auto">For example, I've personally seen many roads from TIGER imports that are way way off, or even nonexistent, especially long driveways in deeply rural areas. But the fact that the main named roads are there at all is a huge benefit to OSM, even if not every road is perfectly accurate, and many will simply never be reviewed.</div><div dir="auto"><br></div><div dir="auto">(With that said, obviously I would want the data to be as accurate as possible, and I'm not making a case to import all the data as is with no review or correction, but simply thinking through the practical reality of the task of making all the data completely accurate. We don't want perfect to be the enemy of good.)</div><div dir="auto"><br></div><div dir="auto">For the issue of conflation with existing buildings with no address tags, that might be too difficult of a case to address without reviewing each and every case by hand, which might be practically infeasible. I've seen a lot of cases where there is a house and a detached garage, or in-law right next to the house. It might be possible to detect if there is only one point that is inside of a building, but for the other cases you mentioned, where it might instead be the centroid of the parcel, or at the intersection of the driveway and the street, I don't think there would be a way around fixing these by hand, which indeed would be infeasible without a large number of people participating.</div><div dir="auto"><br></div><div dir="auto">I think this goes back to my earlier point: if the address points were added and not conflated with an existing building, would that still be valuable? It may not be perfect. It may go against the "one feature, one object" principle. But I think at the end of the day, it might provide enough value to do it anyway.</div><div dir="auto"><br></div><div dir="auto">Thinking about it in terms of short- vs long-term gains vs work, I don't have extensive experience cleaning up bad imports, so I appreciate that I may be missing some perspective on the woes of bad data... but one could also see all of the missing addresses and houses as long-term work, the same way that fixing the accuracy of imported data is long-term work. If you see *all* of it as work, at the other end of an import, was there a net gain in work accomplished? If there aren't extensive problems with the address data, then you could choose to think about it like more work was done with adding good address data than work was added with bad or not-perfect-but-usable data.</div>
<div dir="auto"><br></div>
<blockquote type="cite" class="gmail_quote" style="margin: 0 0 0 0.75ex; border-left: 1px solid #0099CC; padding-left: 0.75ex;">
<div dir="auto"><span style="font-size: 12pt;">From the legal standpoint, it would be best to proceed only</span></div>
<div dir="auto">with those counties that have granted fairly broad authority to use</div>
<div dir="auto">their cadastral data. Those include the five boroughs of New York City</div>
<div dir="auto">(that is, Bronx, Kings, New York, RIchmond and Queens Counties), and</div>
<div dir="auto">the counties of Cayuga, Chautauqua, Cortland, Erie, Genesee, Greene,</div>
<div dir="auto">Lewis, Ontario, Orange, Rensselaer, Sullivan, Tioga, Tompkins, Ulster,</div>
<div dir="auto">Warren and Westchester. In New York City, the job is essentially</div>
<div dir="auto">done, because there have been massive (and relatively well curated)</div>
<div dir="auto">imports of the public data from the city's GIS department. I'd</div>
<div dir="auto">recommend avoiding the Long Island counties of Nassau and Suffolk,</div>
<div dir="auto">because they've been litigious in the past about their data.</div>
<div dir="auto"><br></div></blockquote><div dir="auto"><br></div><div dir="auto">Thanks so much for this list! Is there anything specific we can reference as far as some kind of proof of such granted authority? It might be useful to add that to the wiki.</div>
<div id="aqm-signature" dir="auto" style="color: black;"><div dir="auto">--</div><div dir="auto">Skyler</div></div>
</div></body>
</html>