[Imports-us] Fwd: Vermont, U.S. address import

Elliott Plack elliott.plack at gmail.com
Fri Oct 7 15:04:33 UTC 2022


jared,

I see how the source could be useful with that specific Overpass query but
also have a better option that will let you get more information from
overpass. It is very simple.

Instead of using the command 'out body', use 'out meta'. The meta includes
all available metadata. In it I can see the changeset, user, and version of
every node. That should help you narrow it down.

Example query: https://overpass-turbo.eu/s/1my4

Example output:

      "type": "node",
      "id": 8825645506,
      "lat": 44.0520033,
      "lon": -73.3760089,
      "timestamp": "2021-06-11T16:30:27Z",
      "version": 1,
      "changeset": 106224744,
      "user": "jared",
      "uid": 3887,
      "tags": {
        "addr:city": "Addison",
        "addr:housenumber": "1245",
        "addr:postcode": "05491",
        "addr:state": "VT",
        "addr:street": "Jersey Street South",
        "source": "esri/USA_NAD_Addresses"
      }

- Elliott


On Fri, Oct 7, 2022 at 10:30 AM Jared <osm at wuntu.org> wrote:

> Elliott or Greg,
>
> Can you walk me through a real example so I can understand how you would
> identify existing addresses?
>
> Let's take Addison, Vermont for example.
>
> The VCGI e911 dataset has 987 address points in Addison.  Here's the data
> file:
>
> https://github.com/JaredOSM/vermont-address-import/blob/main/town_e911_address_points/e911_address_points_addison.geojson
>
> When I run an overpass query for all elements in Addison that have a
> housenumber or street: https://overpass-turbo.eu/s/1mxX
> I find that there are already a total of 142 nodes and ways with address
> information OSM.
>
> By looking at the overpass results, I can immediately see that 55 of the
> existing OSM elements have a "ref:vcgi:esiteid" Key/Value pair.  Without
> any further queries, I have a high level of confidence that I can remove
> all 55 address points from my import file, as they are not even
> worth considering for an automated import.  This seems like a safe and
> efficient way of eliminating the chance of importing duplicate data.
> Obviously the other data points need to be evaluated, but why not remove
> the 55 for which I have high confidence?
>
> Thanks for helping walk me through how you would approach it, or explain
> why my technique could be flawed.
> Jared
>
> On Fri, Oct 7, 2022 at 9:50 AM Elliott Plack <elliott.plack at gmail.com>
> wrote:
>
>> Jared,
>>
>> This looks great! I want to thank you for the due diligence. The process
>> looks sound.
>>
>> I do agree about the source tags on the nodes, they may not be as
>> reliable. In my experience I check the editor/history of a node for
>> authority and if I saw it was made via an import account, I might hold it
>> to a different standard--not a bad thing. If you are concerned about
>> downstream querying of previously imported addresses, you can query out
>> things from the import using the changeset (keep a record), user, or
>> version with overpass. I'd recommend looking at that option.
>>
>> Otherwise I applaud the effort.
>>
>> Thanks,
>>
>> Elliott Plack
>>
>>
>> On Fri, Oct 7, 2022 at 8:33 AM Greg Troxel <gdt at lexort.com> wrote:
>>
>>>
>>> Jared <osm at wuntu.org> writes:
>>>
>>> > On Thu, Oct 6, 2022 at 8:13 AM Greg Troxel <gdt at lexort.com> wrote:
>>> >
>>> >> You are going to have to deal witha matching addresses between import
>>> >> source and OSM programmatically like in #1 above, once you move beyond
>>> >> non-addressed towns.  Once you do that, the ref won't help, as it
>>> won't
>>> >> be 100% reliable.  Therefore it's noise.
>>> >
>>> > I was thinking of using the foreign key for a different use case.  I
>>> agree
>>> > that relying on this key for *overwriting* OSM data does not seem safe.
>>> > The scenario I'm thinking about is for NEW addresses that are added to
>>> the
>>> > VCGI dataset.  To determine if a NEW VCGI e911 address exists in OSM,
>>> the
>>> > "ref:vcgi:esiteid" tag would seem to be very helpful.  If an address
>>> in OSM
>>> > already has that unique esiteid key, then we can be confident that it
>>> > should be skipped.  If the esiteid does not exist in OSM, then other
>>> > signals should be evaluated (housenumber, streetname, lat/long, etc.,
>>> but
>>> > those can be less precise due to misspellings or slightly different
>>> > coordinates.
>>>
>>> I see where you're going but I think you need to get the fuzzy match
>>> right anyway and it's not going to help that much to have a key.
>>>
>>> > I'd like to hear the negative impact a foreign key causes.  There are
>>> other
>>> > similar foreign keys (eg. wikidata, wikipedia) and I've never found
>>> them to
>>> > be detrimental to my work, but don't want to cause issues for others.
>>> The
>>> > 55,000 VT addresses that have been added using the Esri layer in the
>>> RapiD
>>> > editor include this "ref:vcgi:esiteid" key, and I've found it to be
>>> useful.
>>>
>>> A fair question, and it may be that the RapiD stuff is out of line.
>>>
>>> I don't think the foreign keys really hurt.  I just think that the
>>> history is that they are less useful than everybody thinks they are going
>>> to be.
>>>
>>> >> Wow.  Are you saying that apartment buildings have coordinates of
>>> entry
>>> >> doors within the building, or that they are artificially skewed to
>>> make
>>> >> rendering non-overlapping, or ?  Surely Vermont has at least some
>>> >> multi-floor apartment buildings that have the same floor design and
>>> thus
>>> >> multiple units that actually do have the same horizontal coordinates.
>>> >
>>> > I've asked my contact at VCGI for clarification on how multi-tenant
>>> > buildings are addressed.  From what I've seen, some multi-tenat
>>> buildings
>>> > just have one e911 address associated with them.  I have seen other
>>> > buildings that have multiple addresses, but I've never seen them
>>> overlap.
>>> > I'll keep a close eye out for this and will see what VCGI has to say.
>>> I do
>>> > have the VT data in a postgis database, but don't have experience
>>> using the
>>> > GIS functions, so I'll try it out.
>>>
>>> Sounds good.  There are hard questions about datasets and as you can see
>>> my bias is to dig in and address them.
>>> _______________________________________________
>>> Imports-us mailing list
>>> Imports-us at openstreetmap.org
>>> https://lists.openstreetmap.org/listinfo/imports-us
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/imports-us/attachments/20221007/6c121d4c/attachment.htm>


More information about the Imports-us mailing list