[Imports-us] Fwd: Vermont, U.S. address import
Jared
osm at wuntu.org
Sat Oct 8 12:11:03 UTC 2022
Adam,
Thanks for jumping and helping out. Thanks also for pointing to the
conflation plugin.
I've fixed the script to handle the issues you reported, and I've also made
a note of the Valley View Drive street name issue you found in the source
data. This will be reported back to VCGI to get updated.
Thanks again,
Jared
On Fri, Oct 7, 2022 at 10:58 PM Adam Franco <adamfranco at gmail.com> wrote:
> I ran the script for Middlebury, loaded the .osm file into JOSM and used
> the conflation plugin
> <https://wiki.openstreetmap.org/wiki/JOSM/Plugins/Conflation> to compare
> it with an OSM data layer. There were a number of conflicts, some with
> street name expansion and casing, others with zip codes.
>
> The .osm file for Middlebury and some notes from spot checking results can
> be found in this PR:
> https://github.com/JaredOSM/vermont-address-import/pull/1
>
> The conflation plugin for JOSM is pretty easy to use and will compare old
> and new tags that are present in the reference layer. The conflicts are
> when one of the subject fields differs and would get overwritten. This test
> also had 77 addresses that were missing in OSM, but in the E911 data.
>
> One place I did just find the ref:vcgi:esiteid=* valuable is for uniquely
> identifying objects for troubleshooting the import.
>
> On Fri, Oct 7, 2022 at 1:27 PM Jared <osm at wuntu.org> wrote:
>
>> Adam,
>>
>> That'd be great. I've split the VCGI e911 dataset up for all towns here:
>>
>> https://github.com/JaredOSM/vermont-address-import/tree/main/town_e911_address_points
>>
>> The Middlebury file is here:
>>
>> https://github.com/JaredOSM/vermont-address-import/blob/main/town_e911_address_points/e911_address_points_middlebury.geojson
>>
>> The "generate_osm_file_from_e911_geojsom.php" script in the repo will
>> process the above file (clean up/expand street names and remove records
>> that have a housenumber of 0. It can output OSM, tab delimited, or geojson.
>>
>> Let me know how it goes.
>> Jared
>>
>> On Fri, Oct 7, 2022 at 12:19 PM Adam Franco <adamfranco at gmail.com> wrote:
>>
>>> Thanks for continuing with this, Jared. Would you be able to generate an
>>> import file for Middlebury? Over the last few years I've mapped every
>>> address in Middlebury by hand or using RAPID, referencing the VCGI E911
>>> data. I'd be interested in comparing my manual and RAPID mapping with the
>>> import to look for discrepancies. In the process I'll attempt a conflation
>>> workflow in JOSM and see what I come up with.
>>>
>>> - Adam
>>>
>>> On Fri, Oct 7, 2022, 11:31 AM Jared <osm at wuntu.org> wrote:
>>>
>>>> Elliott,
>>>>
>>>> With the Addison, Vermont use case, I'm not talking about the source
>>>> tag (I'm fine with only included source in the changeset tags, and have
>>>> already updated the import proposal). I'm referring to the
>>>> "ref:vcgi:esiteid" key that stores a unique ID for the e911 address from
>>>> the source VCGI database. Greg was suggesting that this should also not be
>>>> included, and is not useful. But it has already been useful to me for
>>>> removing existing OSM addresses from my import files. So I'm trying to
>>>> understand if my use of this ref:vcgi:esiteid tag is flawed, or if it
>>>> causes harm to others. For what it's worth, the "ref:vcgi:esiteid" tag was
>>>> modeled on the "nysgissam:nysaddresspointid" tag that was used for the
>>>> recent "New York (state)/NYS GIS SAM Address Points Import":
>>>> https://wiki.openstreetmap.org/wiki/New_York_(state)/NYS_GIS_SAM_Address_Points_Import
>>>>
>>>> Thanks,
>>>> Jared
>>>>
>>>> On Fri, Oct 7, 2022 at 11:04 AM Elliott Plack <elliott.plack at gmail.com>
>>>> wrote:
>>>>
>>>>> jared,
>>>>>
>>>>> I see how the source could be useful with that specific Overpass query
>>>>> but also have a better option that will let you get more information from
>>>>> overpass. It is very simple.
>>>>>
>>>>> Instead of using the command 'out body', use 'out meta'. The meta
>>>>> includes all available metadata. In it I can see the changeset, user, and
>>>>> version of every node. That should help you narrow it down.
>>>>>
>>>>> Example query: https://overpass-turbo.eu/s/1my4
>>>>>
>>>>> Example output:
>>>>>
>>>>> "type": "node",
>>>>> "id": 8825645506,
>>>>> "lat": 44.0520033,
>>>>> "lon": -73.3760089,
>>>>> "timestamp": "2021-06-11T16:30:27Z",
>>>>> "version": 1,
>>>>> "changeset": 106224744,
>>>>> "user": "jared",
>>>>> "uid": 3887,
>>>>> "tags": {
>>>>> "addr:city": "Addison",
>>>>> "addr:housenumber": "1245",
>>>>> "addr:postcode": "05491",
>>>>> "addr:state": "VT",
>>>>> "addr:street": "Jersey Street South",
>>>>> "source": "esri/USA_NAD_Addresses"
>>>>> }
>>>>>
>>>>> - Elliott
>>>>>
>>>>>
>>>>> On Fri, Oct 7, 2022 at 10:30 AM Jared <osm at wuntu.org> wrote:
>>>>>
>>>>>> Elliott or Greg,
>>>>>>
>>>>>> Can you walk me through a real example so I can understand how you
>>>>>> would identify existing addresses?
>>>>>>
>>>>>> Let's take Addison, Vermont for example.
>>>>>>
>>>>>> The VCGI e911 dataset has 987 address points in Addison. Here's the
>>>>>> data file:
>>>>>>
>>>>>> https://github.com/JaredOSM/vermont-address-import/blob/main/town_e911_address_points/e911_address_points_addison.geojson
>>>>>>
>>>>>> When I run an overpass query for all elements in Addison that have a
>>>>>> housenumber or street: https://overpass-turbo.eu/s/1mxX
>>>>>> I find that there are already a total of 142 nodes and ways with
>>>>>> address information OSM.
>>>>>>
>>>>>> By looking at the overpass results, I can immediately see that 55 of
>>>>>> the existing OSM elements have a "ref:vcgi:esiteid" Key/Value pair.
>>>>>> Without any further queries, I have a high level of confidence that I can
>>>>>> remove all 55 address points from my import file, as they are not even
>>>>>> worth considering for an automated import. This seems like a safe and
>>>>>> efficient way of eliminating the chance of importing duplicate data.
>>>>>> Obviously the other data points need to be evaluated, but why not remove
>>>>>> the 55 for which I have high confidence?
>>>>>>
>>>>>> Thanks for helping walk me through how you would approach it, or
>>>>>> explain why my technique could be flawed.
>>>>>> Jared
>>>>>>
>>>>>> On Fri, Oct 7, 2022 at 9:50 AM Elliott Plack <elliott.plack at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Jared,
>>>>>>>
>>>>>>> This looks great! I want to thank you for the due diligence. The
>>>>>>> process looks sound.
>>>>>>>
>>>>>>> I do agree about the source tags on the nodes, they may not be as
>>>>>>> reliable. In my experience I check the editor/history of a node for
>>>>>>> authority and if I saw it was made via an import account, I might hold it
>>>>>>> to a different standard--not a bad thing. If you are concerned about
>>>>>>> downstream querying of previously imported addresses, you can query out
>>>>>>> things from the import using the changeset (keep a record), user, or
>>>>>>> version with overpass. I'd recommend looking at that option.
>>>>>>>
>>>>>>> Otherwise I applaud the effort.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Elliott Plack
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Oct 7, 2022 at 8:33 AM Greg Troxel <gdt at lexort.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Jared <osm at wuntu.org> writes:
>>>>>>>>
>>>>>>>> > On Thu, Oct 6, 2022 at 8:13 AM Greg Troxel <gdt at lexort.com>
>>>>>>>> wrote:
>>>>>>>> >
>>>>>>>> >> You are going to have to deal witha matching addresses between
>>>>>>>> import
>>>>>>>> >> source and OSM programmatically like in #1 above, once you move
>>>>>>>> beyond
>>>>>>>> >> non-addressed towns. Once you do that, the ref won't help, as
>>>>>>>> it won't
>>>>>>>> >> be 100% reliable. Therefore it's noise.
>>>>>>>> >
>>>>>>>> > I was thinking of using the foreign key for a different use
>>>>>>>> case. I agree
>>>>>>>> > that relying on this key for *overwriting* OSM data does not seem
>>>>>>>> safe.
>>>>>>>> > The scenario I'm thinking about is for NEW addresses that are
>>>>>>>> added to the
>>>>>>>> > VCGI dataset. To determine if a NEW VCGI e911 address exists in
>>>>>>>> OSM, the
>>>>>>>> > "ref:vcgi:esiteid" tag would seem to be very helpful. If an
>>>>>>>> address in OSM
>>>>>>>> > already has that unique esiteid key, then we can be confident
>>>>>>>> that it
>>>>>>>> > should be skipped. If the esiteid does not exist in OSM, then
>>>>>>>> other
>>>>>>>> > signals should be evaluated (housenumber, streetname, lat/long,
>>>>>>>> etc., but
>>>>>>>> > those can be less precise due to misspellings or slightly
>>>>>>>> different
>>>>>>>> > coordinates.
>>>>>>>>
>>>>>>>> I see where you're going but I think you need to get the fuzzy match
>>>>>>>> right anyway and it's not going to help that much to have a key.
>>>>>>>>
>>>>>>>> > I'd like to hear the negative impact a foreign key causes. There
>>>>>>>> are other
>>>>>>>> > similar foreign keys (eg. wikidata, wikipedia) and I've never
>>>>>>>> found them to
>>>>>>>> > be detrimental to my work, but don't want to cause issues for
>>>>>>>> others. The
>>>>>>>> > 55,000 VT addresses that have been added using the Esri layer in
>>>>>>>> the RapiD
>>>>>>>> > editor include this "ref:vcgi:esiteid" key, and I've found it to
>>>>>>>> be useful.
>>>>>>>>
>>>>>>>> A fair question, and it may be that the RapiD stuff is out of line.
>>>>>>>>
>>>>>>>> I don't think the foreign keys really hurt. I just think that the
>>>>>>>> history is that they are less useful than everybody thinks they are
>>>>>>>> going
>>>>>>>> to be.
>>>>>>>>
>>>>>>>> >> Wow. Are you saying that apartment buildings have coordinates
>>>>>>>> of entry
>>>>>>>> >> doors within the building, or that they are artificially skewed
>>>>>>>> to make
>>>>>>>> >> rendering non-overlapping, or ? Surely Vermont has at least some
>>>>>>>> >> multi-floor apartment buildings that have the same floor design
>>>>>>>> and thus
>>>>>>>> >> multiple units that actually do have the same horizontal
>>>>>>>> coordinates.
>>>>>>>> >
>>>>>>>> > I've asked my contact at VCGI for clarification on how
>>>>>>>> multi-tenant
>>>>>>>> > buildings are addressed. From what I've seen, some multi-tenat
>>>>>>>> buildings
>>>>>>>> > just have one e911 address associated with them. I have seen
>>>>>>>> other
>>>>>>>> > buildings that have multiple addresses, but I've never seen them
>>>>>>>> overlap.
>>>>>>>> > I'll keep a close eye out for this and will see what VCGI has to
>>>>>>>> say. I do
>>>>>>>> > have the VT data in a postgis database, but don't have experience
>>>>>>>> using the
>>>>>>>> > GIS functions, so I'll try it out.
>>>>>>>>
>>>>>>>> Sounds good. There are hard questions about datasets and as you
>>>>>>>> can see
>>>>>>>> my bias is to dig in and address them.
>>>>>>>> _______________________________________________
>>>>>>>> Imports-us mailing list
>>>>>>>> Imports-us at openstreetmap.org
>>>>>>>> https://lists.openstreetmap.org/listinfo/imports-us
>>>>>>>>
>>>>>>> _______________________________________________
>>>> Imports-us mailing list
>>>> Imports-us at openstreetmap.org
>>>> https://lists.openstreetmap.org/listinfo/imports-us
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/imports-us/attachments/20221008/17e634da/attachment-0001.htm>
More information about the Imports-us
mailing list