[Imports] HIFLD
James Crawford
sherbets at disroot.org
Sun Oct 2 20:43:01 UTC 2022
Hi Greg,
Thank you for the speedy response.
> Are you an employee of or acting on behalf of DHS?
Nope. Just been contributing to OSM for the past couple years in my free
time. I just like seeing high quality data getting added to the map.
> Really only high-quality data should be imported, so I don't follow a
> plan to import data of varying quality. By high-quality I mean that
> substantially all (>= 99%) of objects in the data set exist and the
> positions are close (within 20m?) to the correct positions.
I totally agree. there are some datasets like"Land Mobile Commercial
Transmission Towers"
https://hifld-geoplatform.opendata.arcgis.com/datasets/geoplatform::land-mobile-commercial-transmission-towers-1/about
where there are a large quantity of objects that are not real or are
extremely inaccurate to any actual data, but there are a couple items in
the dataset that do actually correspond to an existing object and enough
of these exist that I felt that it could be worth my time in certain
situations to manually review extracts of the data to find the good
quality data.
> So data quality assessment
> needs to ask "do >= 99% of the objects in the db currently exist".
>
> In the web page, quality is labeled with subjective terms, and for an
> effort of this scope I'd like to see quantitative definitions.
Sure. I'd be happy to create a definition like: "out of 100 objects
taken randomly from this dataset, X were accurate and within 20m of the
actual location of the object" and reassess all of the datasets.
> In general, I am uncomfortable with advice for people to download data,
> transform tags and upload. I think it's far better to have a published
> program (e.g. python script)
>
> This way, people can run the conflation and examine the results to
> assess quality. And, I think actually writing this as code and
> expecting it to be run repeatedly sharpens the thinking about the import
> transformation process and shines a more careful light on quality.
The reason I suggested conflating manually was because I have next to 0
experience with programming, although I would be happy to try to create
one, or seek help in having one created.
> I think it's ok to take a dataset and do statistical quality control, where some fraction
> of points are checked (against on-the-ground reality), and then if >99%
> of them are correct, to assume they are all correct (enough that "fix
> later" is ok).
That would be the idea. I'm not comfortable uploading anything with even
slight inaccuracy without manually reviewing the objects first.
> Note that some states, including MA, have email lists, and a number of
> active mappers do not believe the use of Slack is legitimate (because
> it's a proprietary system requiring signing a contract with a particular
> company). However others think it's okk.
>
> And obviously talk-us, but it makes sense to get a more baked proposal
> here.
I will be a little more clear with how I plan to reach out to local mappers:
I will message all the active mappers I can find in each state,
regardless if they are active on lists, slack, etc. etc. to make sure
that I can get as solid of a local support for this import as I can
because of it's scale and scope. I'll also reach out to the official
channels as well
> Some of these datasets seem to be compliations of other datasets.
> Nursing homes, that I picked because I can sort of armchair assess
> quality, seems to be copied from state databases, at least in MA. The
> source data is by address, so it was geocoded somehow. All of this is
> unclear about licensing, so that makes me really want to understand the
> "if published by the US, is PD" claim.
I think I may have been mistaken about the licensing situation when I
made that claim. however, on the main page for the HIFLD datasets, there
is a link for a data catalog spreadsheet with information about each
dataset. All of the data that is publicly accessible on the website is
open/in the public domain, and all copyrighted data is secured and
requires special access, and I haven't included any of the copyrighted
datasets in the list on the wiki as far as I am aware.
> I am particularly skeptical of trail data,
The trail dataset I absolutely do not plan to import directly as the
lines from the HIFLD because the accuracy the lines drawn is less than
that which I expect form OpenStreetMap, but I feel that the metadata is
extremely useful and OSM could benefit from it.
> and this page doesn't clearly
> separate import candidates from "recommend against import; useful as
> reference layer".
I'll be sure to consider this as well when I redo the quality assessment
for the datasets.
-James Crawford (SherbetS)
More information about the Imports
mailing list