[Imports-us] Fwd: Vermont, U.S. address import
Jared
osm at wuntu.org
Fri Oct 7 17:27:31 UTC 2022
Adam,
That'd be great. I've split the VCGI e911 dataset up for all towns here:
https://github.com/JaredOSM/vermont-address-import/tree/main/town_e911_address_points
The Middlebury file is here:
https://github.com/JaredOSM/vermont-address-import/blob/main/town_e911_address_points/e911_address_points_middlebury.geojson
The "generate_osm_file_from_e911_geojsom.php" script in the repo will
process the above file (clean up/expand street names and remove records
that have a housenumber of 0. It can output OSM, tab delimited, or geojson.
Let me know how it goes.
Jared
On Fri, Oct 7, 2022 at 12:19 PM Adam Franco <adamfranco at gmail.com> wrote:
> Thanks for continuing with this, Jared. Would you be able to generate an
> import file for Middlebury? Over the last few years I've mapped every
> address in Middlebury by hand or using RAPID, referencing the VCGI E911
> data. I'd be interested in comparing my manual and RAPID mapping with the
> import to look for discrepancies. In the process I'll attempt a conflation
> workflow in JOSM and see what I come up with.
>
> - Adam
>
> On Fri, Oct 7, 2022, 11:31 AM Jared <osm at wuntu.org> wrote:
>
>> Elliott,
>>
>> With the Addison, Vermont use case, I'm not talking about the source tag
>> (I'm fine with only included source in the changeset tags, and have already
>> updated the import proposal). I'm referring to the "ref:vcgi:esiteid" key
>> that stores a unique ID for the e911 address from the source VCGI database.
>> Greg was suggesting that this should also not be included, and is not
>> useful. But it has already been useful to me for removing existing OSM
>> addresses from my import files. So I'm trying to understand if my use of
>> this ref:vcgi:esiteid tag is flawed, or if it causes harm to others. For
>> what it's worth, the "ref:vcgi:esiteid" tag was modeled on the
>> "nysgissam:nysaddresspointid" tag that was used for the recent "New York
>> (state)/NYS GIS SAM Address Points Import":
>> https://wiki.openstreetmap.org/wiki/New_York_(state)/NYS_GIS_SAM_Address_Points_Import
>>
>> Thanks,
>> Jared
>>
>> On Fri, Oct 7, 2022 at 11:04 AM Elliott Plack <elliott.plack at gmail.com>
>> wrote:
>>
>>> jared,
>>>
>>> I see how the source could be useful with that specific Overpass query
>>> but also have a better option that will let you get more information from
>>> overpass. It is very simple.
>>>
>>> Instead of using the command 'out body', use 'out meta'. The meta
>>> includes all available metadata. In it I can see the changeset, user, and
>>> version of every node. That should help you narrow it down.
>>>
>>> Example query: https://overpass-turbo.eu/s/1my4
>>>
>>> Example output:
>>>
>>> "type": "node",
>>> "id": 8825645506,
>>> "lat": 44.0520033,
>>> "lon": -73.3760089,
>>> "timestamp": "2021-06-11T16:30:27Z",
>>> "version": 1,
>>> "changeset": 106224744,
>>> "user": "jared",
>>> "uid": 3887,
>>> "tags": {
>>> "addr:city": "Addison",
>>> "addr:housenumber": "1245",
>>> "addr:postcode": "05491",
>>> "addr:state": "VT",
>>> "addr:street": "Jersey Street South",
>>> "source": "esri/USA_NAD_Addresses"
>>> }
>>>
>>> - Elliott
>>>
>>>
>>> On Fri, Oct 7, 2022 at 10:30 AM Jared <osm at wuntu.org> wrote:
>>>
>>>> Elliott or Greg,
>>>>
>>>> Can you walk me through a real example so I can understand how you
>>>> would identify existing addresses?
>>>>
>>>> Let's take Addison, Vermont for example.
>>>>
>>>> The VCGI e911 dataset has 987 address points in Addison. Here's the
>>>> data file:
>>>>
>>>> https://github.com/JaredOSM/vermont-address-import/blob/main/town_e911_address_points/e911_address_points_addison.geojson
>>>>
>>>> When I run an overpass query for all elements in Addison that have a
>>>> housenumber or street: https://overpass-turbo.eu/s/1mxX
>>>> I find that there are already a total of 142 nodes and ways with
>>>> address information OSM.
>>>>
>>>> By looking at the overpass results, I can immediately see that 55 of
>>>> the existing OSM elements have a "ref:vcgi:esiteid" Key/Value pair.
>>>> Without any further queries, I have a high level of confidence that I can
>>>> remove all 55 address points from my import file, as they are not even
>>>> worth considering for an automated import. This seems like a safe and
>>>> efficient way of eliminating the chance of importing duplicate data.
>>>> Obviously the other data points need to be evaluated, but why not remove
>>>> the 55 for which I have high confidence?
>>>>
>>>> Thanks for helping walk me through how you would approach it, or
>>>> explain why my technique could be flawed.
>>>> Jared
>>>>
>>>> On Fri, Oct 7, 2022 at 9:50 AM Elliott Plack <elliott.plack at gmail.com>
>>>> wrote:
>>>>
>>>>> Jared,
>>>>>
>>>>> This looks great! I want to thank you for the due diligence. The
>>>>> process looks sound.
>>>>>
>>>>> I do agree about the source tags on the nodes, they may not be as
>>>>> reliable. In my experience I check the editor/history of a node for
>>>>> authority and if I saw it was made via an import account, I might hold it
>>>>> to a different standard--not a bad thing. If you are concerned about
>>>>> downstream querying of previously imported addresses, you can query out
>>>>> things from the import using the changeset (keep a record), user, or
>>>>> version with overpass. I'd recommend looking at that option.
>>>>>
>>>>> Otherwise I applaud the effort.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Elliott Plack
>>>>>
>>>>>
>>>>> On Fri, Oct 7, 2022 at 8:33 AM Greg Troxel <gdt at lexort.com> wrote:
>>>>>
>>>>>>
>>>>>> Jared <osm at wuntu.org> writes:
>>>>>>
>>>>>> > On Thu, Oct 6, 2022 at 8:13 AM Greg Troxel <gdt at lexort.com> wrote:
>>>>>> >
>>>>>> >> You are going to have to deal witha matching addresses between
>>>>>> import
>>>>>> >> source and OSM programmatically like in #1 above, once you move
>>>>>> beyond
>>>>>> >> non-addressed towns. Once you do that, the ref won't help, as it
>>>>>> won't
>>>>>> >> be 100% reliable. Therefore it's noise.
>>>>>> >
>>>>>> > I was thinking of using the foreign key for a different use case.
>>>>>> I agree
>>>>>> > that relying on this key for *overwriting* OSM data does not seem
>>>>>> safe.
>>>>>> > The scenario I'm thinking about is for NEW addresses that are added
>>>>>> to the
>>>>>> > VCGI dataset. To determine if a NEW VCGI e911 address exists in
>>>>>> OSM, the
>>>>>> > "ref:vcgi:esiteid" tag would seem to be very helpful. If an
>>>>>> address in OSM
>>>>>> > already has that unique esiteid key, then we can be confident that
>>>>>> it
>>>>>> > should be skipped. If the esiteid does not exist in OSM, then other
>>>>>> > signals should be evaluated (housenumber, streetname, lat/long,
>>>>>> etc., but
>>>>>> > those can be less precise due to misspellings or slightly different
>>>>>> > coordinates.
>>>>>>
>>>>>> I see where you're going but I think you need to get the fuzzy match
>>>>>> right anyway and it's not going to help that much to have a key.
>>>>>>
>>>>>> > I'd like to hear the negative impact a foreign key causes. There
>>>>>> are other
>>>>>> > similar foreign keys (eg. wikidata, wikipedia) and I've never found
>>>>>> them to
>>>>>> > be detrimental to my work, but don't want to cause issues for
>>>>>> others. The
>>>>>> > 55,000 VT addresses that have been added using the Esri layer in
>>>>>> the RapiD
>>>>>> > editor include this "ref:vcgi:esiteid" key, and I've found it to be
>>>>>> useful.
>>>>>>
>>>>>> A fair question, and it may be that the RapiD stuff is out of line.
>>>>>>
>>>>>> I don't think the foreign keys really hurt. I just think that the
>>>>>> history is that they are less useful than everybody thinks they are
>>>>>> going
>>>>>> to be.
>>>>>>
>>>>>> >> Wow. Are you saying that apartment buildings have coordinates of
>>>>>> entry
>>>>>> >> doors within the building, or that they are artificially skewed to
>>>>>> make
>>>>>> >> rendering non-overlapping, or ? Surely Vermont has at least some
>>>>>> >> multi-floor apartment buildings that have the same floor design
>>>>>> and thus
>>>>>> >> multiple units that actually do have the same horizontal
>>>>>> coordinates.
>>>>>> >
>>>>>> > I've asked my contact at VCGI for clarification on how multi-tenant
>>>>>> > buildings are addressed. From what I've seen, some multi-tenat
>>>>>> buildings
>>>>>> > just have one e911 address associated with them. I have seen other
>>>>>> > buildings that have multiple addresses, but I've never seen them
>>>>>> overlap.
>>>>>> > I'll keep a close eye out for this and will see what VCGI has to
>>>>>> say. I do
>>>>>> > have the VT data in a postgis database, but don't have experience
>>>>>> using the
>>>>>> > GIS functions, so I'll try it out.
>>>>>>
>>>>>> Sounds good. There are hard questions about datasets and as you can
>>>>>> see
>>>>>> my bias is to dig in and address them.
>>>>>> _______________________________________________
>>>>>> Imports-us mailing list
>>>>>> Imports-us at openstreetmap.org
>>>>>> https://lists.openstreetmap.org/listinfo/imports-us
>>>>>>
>>>>> _______________________________________________
>> Imports-us mailing list
>> Imports-us at openstreetmap.org
>> https://lists.openstreetmap.org/listinfo/imports-us
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/imports-us/attachments/20221007/5feac5ce/attachment-0001.htm>
More information about the Imports-us
mailing list