[Imports] HIFLD

Sun Oct 2 20:43:01 UTC 2022

Hi Greg,

Thank you for the speedy response.

>   Are you an employee of or acting on behalf of DHS?
Nope. Just been contributing to OSM for the past couple years in my free 
time. I just like seeing high quality data getting added to the map.
> Really only high-quality data should be imported, so I don't follow a
> plan to import data of varying quality.  By high-quality I mean that
> substantially all (>= 99%) of objects in the data set exist and the
> positions are close (within 20m?) to the correct positions.

I totally agree. there are some datasets like"Land Mobile Commercial 
Transmission Towers" 
https://hifld-geoplatform.opendata.arcgis.com/datasets/geoplatform::land-mobile-commercial-transmission-towers-1/about

where there are a large quantity of objects that are not real or are 
extremely inaccurate to any actual data, but there are a couple items in 
the dataset that do actually correspond to an existing object and enough 
of these exist that I felt that it could be worth my time in certain 
situations to manually review extracts of the data to find the good 
quality data.

> So data quality assessment
> needs to ask "do >= 99% of the objects in the db currently exist".
>
> In the web page, quality is labeled with subjective terms, and for an
> effort of this scope I'd like to see quantitative definitions.
Sure. I'd be happy to create a definition like: "out of 100 objects 
taken randomly from this dataset, X were accurate and within 20m of the 
actual location of the object" and reassess all of the datasets.
> In general, I am uncomfortable with advice for people to download data,
> transform tags and upload.  I think it's far better to have a published
> program (e.g. python script)
>
> This way, people can run the conflation and examine the results to
> assess quality.  And, I think actually writing this as code and
> expecting it to be run repeatedly sharpens the thinking about the import
> transformation process and shines a more careful light on quality.
The reason I suggested conflating manually was because I have next to 0 
experience with programming, although I would be happy to try to create 
one, or seek help in having one created.
> I think it's ok to take a dataset and do statistical quality control, where some fraction
> of points are checked (against on-the-ground reality), and then if >99%
> of them are correct, to assume they are all correct (enough that "fix
> later" is ok).
That would be the idea. I'm not comfortable uploading anything with even 
slight inaccuracy without manually reviewing the objects first.
> Note that some states, including MA, have email lists, and a number of
> active mappers do not believe the use of Slack is legitimate (because
> it's a proprietary system requiring signing a contract with a particular
> company).  However others think it's okk.
>
> And obviously talk-us, but it makes sense to get a more baked proposal
> here.

I will be a little more clear with how I plan to reach out to local mappers:

I will message all the active mappers I can find in each state, 
regardless if they are active on lists, slack, etc. etc. to make sure 
that I can get as solid of a local support for this import as I can 
because of it's scale and scope. I'll also reach out to the official 
channels as well

> Some of these datasets seem to be compliations of other datasets.
> Nursing homes, that I picked because I can sort of armchair assess
> quality, seems to be copied from state databases, at least in MA.  The
> source data is by address, so it was geocoded somehow.  All of this is
> unclear about licensing, so that makes me really want to understand the
> "if published by the US, is PD" claim.
I think I may have been mistaken about the licensing situation when I 
made that claim. however, on the main page for the HIFLD datasets, there 
is a link for a data catalog spreadsheet with information about each 
dataset. All of the data that is publicly accessible on the website is 
open/in the public domain, and all copyrighted data is secured and 
requires special access, and I haven't included any of the copyrighted 
datasets in the list on the wiki as far as I am aware.
> I am particularly skeptical of trail data,
The trail dataset I absolutely do not plan to import directly as the 
lines from the HIFLD because the accuracy the lines drawn is less than 
that which I expect form OpenStreetMap, but I feel that the metadata is 
extremely useful and OSM could benefit from it.
> and this page doesn't clearly
> separate import candidates from "recommend against import; useful as
> reference layer".

I'll be sure to consider this as well when I redo the quality assessment 
for the datasets.

-James Crawford (SherbetS)