[Imports] Proposed Import - Cache Valley, UT Address Points

Fri Oct 7 17:19:04 UTC 2022

OK, I've made some updates to the procedure. Please see the updated 
readme section [1]. Summary of issues and their proposed resolutions:

	* Deduplication of existing and new data - Use the JOSM Conflation 
plugin to auto-merge. Then manually review the rest _before_ uploading.

  	* Using the temporary UGRC:address_type tag to only merge "primary" 
addresses to buildings (ie, not "apt 4", etc. Leave those as points.) 
(Example screenshot [2])- This tag should probably be removed before 
upload.

  	* Handling future updates - Use UGRC:import_uuid tag to maintain a 
unique identifier back to the original database. Later when updating, 
skip rows with IDs that already exist.
  	* Other discussion topics in wiki talk page [3]

	* I've added some sample data [4] to the git repository. Perhaps 
someone can go over that to see if it meets import standards.

  	* There are indeed still some issues with the data, mainly points one 
on top of another. From what I've seen, they are detected pretty well by 
the JOSM Validator, to be fixed before upload.

Jacob

On 2022-10-07 10:47, Greg Troxel wrote:

> Mike Thompson <miketho16 at gmail.com> writes:
> 
> On Thu, Oct 6, 2022 at 2:07 PM <jacob at cyptem.com> wrote:
> 
> Oh, that makes sense - thanks for the explanation. So the idea is to 
> never
> let known issues into the main dataset in the first place.
> 
> Opinions may differ in a project as big as OSM, but that is my
> understanding of the consensus.

Yes, it's my understanding of consensus too, in addition to my opinion.

Note that there are two sets of standards for OSM.  One is for how a
person doing mapping is judged, when they are mapping at human speed in
an editor.  We tend to have notions of the right way, but be pretty
lenient about things going wrong and getting fixed later, because new
people get things wrong, but we need new people to have more people and
that's how later experienced people start.

The other set of standards is for importsa and mechanical edits,
basically anything at scale.  There, we basically insist that everything
is done fully meets standards of "done right".  Yes, it means it is
harder to do an import, and takes more time to write code, but if that's
what it takes to do it right, that's how it is.  We don't have any
notion that it's better to let deficient imports happen because it makes
it easier for people to do them.

Generally I think people agree that if an import is slowed down or even
doesn't happen because it's too hard to do it right, that's ok.

>> And thanks to all for your patience, understanding, and expertise!
> Thanks for listening to community input!

Agreed - that's what this list is for and it's great to see it working,
which it doesn't always seem to.

Links:
------
[1] https://github.com/lint3/cachevalley-address-import-osm#procedure
[2] 
https://raw.githubusercontent.com/lint3/cachevalley-address-import-osm/main/imgs/conflation_example.png
[3] 
https://wiki.openstreetmap.org/wiki/Talk:Utah/CacheValleyAddressImport
[4] 
https://github.com/lint3/cachevalley-address-import-osm/tree/main/sample_data
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/imports/attachments/20221007/aa21f8b8/attachment.htm>