[Imports-us] Fwd: Vermont, U.S. address import
Alex Hennings
blackboxlogic at gmail.com
Fri Oct 7 20:39:03 UTC 2022
If you want to hold foreign keys, or a list of IDs you've processed, that
could be a file outside of OSM mapping e911ID => {elementType, elementID,
version}. I agree with everyone that foreign keys are not as reliable as
you'd want them to be. You'd need to fall back on fuzzy matching for most
things anyways. When looking for new records added to your data source, you
might just be able to filter on the e911 date_created field for rows added
since the last import.
You keep asking "why not put them in". It's really down to what OSM is for:
"verifiable facts about the real world". A wikidata tag is something that a
data consumer could use, the ID's you're looking at are only useful for a
data editor, and only in narrow conditions. The downside of having them is
clutter. Makes the data larger. Every time a human makes a change they need
to work around another piece of trash. It can confuse mappers who don't
know what to do with that tag (merging an address with a building, where
does that tag go).
But "clutter" isn't *that* bad, and if you really want to do it, don't let
anyone tell you otherwise. You're the one volunteering massive amounts of
your time for a public good.
> "But RapiD puts these IDs in"
As far as I can tell the people who made and maintain RapiD have never
attempted to meet the expectations of the community. RapiD doing something
is not a reason to follow suit. I think it's some combination of Esri and
facebook? Example <https://github.com/facebookincubator/RapiD/issues/98>.
They don't do any data validation of the data they import, and even when a
human flags an element as wrong.... they just go and suggest that same
element to the next person, as if. They say the data they're importing
isn't an import, so the import guidelines (and community feedback) don't
apply to them.</rant>
-Alex
On Fri, Oct 7, 2022 at 1:31 PM Jared <osm at wuntu.org> wrote:
> Adam,
>
> That'd be great. I've split the VCGI e911 dataset up for all towns here:
>
> https://github.com/JaredOSM/vermont-address-import/tree/main/town_e911_address_points
>
> The Middlebury file is here:
>
> https://github.com/JaredOSM/vermont-address-import/blob/main/town_e911_address_points/e911_address_points_middlebury.geojson
>
> The "generate_osm_file_from_e911_geojsom.php" script in the repo will
> process the above file (clean up/expand street names and remove records
> that have a housenumber of 0. It can output OSM, tab delimited, or geojson.
>
> Let me know how it goes.
> Jared
>
> On Fri, Oct 7, 2022 at 12:19 PM Adam Franco <adamfranco at gmail.com> wrote:
>
>> Thanks for continuing with this, Jared. Would you be able to generate an
>> import file for Middlebury? Over the last few years I've mapped every
>> address in Middlebury by hand or using RAPID, referencing the VCGI E911
>> data. I'd be interested in comparing my manual and RAPID mapping with the
>> import to look for discrepancies. In the process I'll attempt a conflation
>> workflow in JOSM and see what I come up with.
>>
>> - Adam
>>
>> On Fri, Oct 7, 2022, 11:31 AM Jared <osm at wuntu.org> wrote:
>>
>>> Elliott,
>>>
>>> With the Addison, Vermont use case, I'm not talking about the source tag
>>> (I'm fine with only included source in the changeset tags, and have already
>>> updated the import proposal). I'm referring to the "ref:vcgi:esiteid" key
>>> that stores a unique ID for the e911 address from the source VCGI database.
>>> Greg was suggesting that this should also not be included, and is not
>>> useful. But it has already been useful to me for removing existing OSM
>>> addresses from my import files. So I'm trying to understand if my use of
>>> this ref:vcgi:esiteid tag is flawed, or if it causes harm to others. For
>>> what it's worth, the "ref:vcgi:esiteid" tag was modeled on the
>>> "nysgissam:nysaddresspointid" tag that was used for the recent "New York
>>> (state)/NYS GIS SAM Address Points Import":
>>> https://wiki.openstreetmap.org/wiki/New_York_(state)/NYS_GIS_SAM_Address_Points_Import
>>>
>>> Thanks,
>>> Jared
>>>
>>> On Fri, Oct 7, 2022 at 11:04 AM Elliott Plack <elliott.plack at gmail.com>
>>> wrote:
>>>
>>>> jared,
>>>>
>>>> I see how the source could be useful with that specific Overpass query
>>>> but also have a better option that will let you get more information from
>>>> overpass. It is very simple.
>>>>
>>>> Instead of using the command 'out body', use 'out meta'. The meta
>>>> includes all available metadata. In it I can see the changeset, user, and
>>>> version of every node. That should help you narrow it down.
>>>>
>>>> Example query: https://overpass-turbo.eu/s/1my4
>>>>
>>>> Example output:
>>>>
>>>> "type": "node",
>>>> "id": 8825645506,
>>>> "lat": 44.0520033,
>>>> "lon": -73.3760089,
>>>> "timestamp": "2021-06-11T16:30:27Z",
>>>> "version": 1,
>>>> "changeset": 106224744,
>>>> "user": "jared",
>>>> "uid": 3887,
>>>> "tags": {
>>>> "addr:city": "Addison",
>>>> "addr:housenumber": "1245",
>>>> "addr:postcode": "05491",
>>>> "addr:state": "VT",
>>>> "addr:street": "Jersey Street South",
>>>> "source": "esri/USA_NAD_Addresses"
>>>> }
>>>>
>>>> - Elliott
>>>>
>>>>
>>>> On Fri, Oct 7, 2022 at 10:30 AM Jared <osm at wuntu.org> wrote:
>>>>
>>>>> Elliott or Greg,
>>>>>
>>>>> Can you walk me through a real example so I can understand how you
>>>>> would identify existing addresses?
>>>>>
>>>>> Let's take Addison, Vermont for example.
>>>>>
>>>>> The VCGI e911 dataset has 987 address points in Addison. Here's the
>>>>> data file:
>>>>>
>>>>> https://github.com/JaredOSM/vermont-address-import/blob/main/town_e911_address_points/e911_address_points_addison.geojson
>>>>>
>>>>> When I run an overpass query for all elements in Addison that have a
>>>>> housenumber or street: https://overpass-turbo.eu/s/1mxX
>>>>> I find that there are already a total of 142 nodes and ways with
>>>>> address information OSM.
>>>>>
>>>>> By looking at the overpass results, I can immediately see that 55 of
>>>>> the existing OSM elements have a "ref:vcgi:esiteid" Key/Value pair.
>>>>> Without any further queries, I have a high level of confidence that I can
>>>>> remove all 55 address points from my import file, as they are not even
>>>>> worth considering for an automated import. This seems like a safe and
>>>>> efficient way of eliminating the chance of importing duplicate data.
>>>>> Obviously the other data points need to be evaluated, but why not remove
>>>>> the 55 for which I have high confidence?
>>>>>
>>>>> Thanks for helping walk me through how you would approach it, or
>>>>> explain why my technique could be flawed.
>>>>> Jared
>>>>>
>>>>> On Fri, Oct 7, 2022 at 9:50 AM Elliott Plack <elliott.plack at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Jared,
>>>>>>
>>>>>> This looks great! I want to thank you for the due diligence. The
>>>>>> process looks sound.
>>>>>>
>>>>>> I do agree about the source tags on the nodes, they may not be as
>>>>>> reliable. In my experience I check the editor/history of a node for
>>>>>> authority and if I saw it was made via an import account, I might hold it
>>>>>> to a different standard--not a bad thing. If you are concerned about
>>>>>> downstream querying of previously imported addresses, you can query out
>>>>>> things from the import using the changeset (keep a record), user, or
>>>>>> version with overpass. I'd recommend looking at that option.
>>>>>>
>>>>>> Otherwise I applaud the effort.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Elliott Plack
>>>>>>
>>>>>>
>>>>>> On Fri, Oct 7, 2022 at 8:33 AM Greg Troxel <gdt at lexort.com> wrote:
>>>>>>
>>>>>>>
>>>>>>> Jared <osm at wuntu.org> writes:
>>>>>>>
>>>>>>> > On Thu, Oct 6, 2022 at 8:13 AM Greg Troxel <gdt at lexort.com> wrote:
>>>>>>> >
>>>>>>> >> You are going to have to deal witha matching addresses between
>>>>>>> import
>>>>>>> >> source and OSM programmatically like in #1 above, once you move
>>>>>>> beyond
>>>>>>> >> non-addressed towns. Once you do that, the ref won't help, as it
>>>>>>> won't
>>>>>>> >> be 100% reliable. Therefore it's noise.
>>>>>>> >
>>>>>>> > I was thinking of using the foreign key for a different use case.
>>>>>>> I agree
>>>>>>> > that relying on this key for *overwriting* OSM data does not seem
>>>>>>> safe.
>>>>>>> > The scenario I'm thinking about is for NEW addresses that are
>>>>>>> added to the
>>>>>>> > VCGI dataset. To determine if a NEW VCGI e911 address exists in
>>>>>>> OSM, the
>>>>>>> > "ref:vcgi:esiteid" tag would seem to be very helpful. If an
>>>>>>> address in OSM
>>>>>>> > already has that unique esiteid key, then we can be confident that
>>>>>>> it
>>>>>>> > should be skipped. If the esiteid does not exist in OSM, then
>>>>>>> other
>>>>>>> > signals should be evaluated (housenumber, streetname, lat/long,
>>>>>>> etc., but
>>>>>>> > those can be less precise due to misspellings or slightly different
>>>>>>> > coordinates.
>>>>>>>
>>>>>>> I see where you're going but I think you need to get the fuzzy match
>>>>>>> right anyway and it's not going to help that much to have a key.
>>>>>>>
>>>>>>> > I'd like to hear the negative impact a foreign key causes. There
>>>>>>> are other
>>>>>>> > similar foreign keys (eg. wikidata, wikipedia) and I've never
>>>>>>> found them to
>>>>>>> > be detrimental to my work, but don't want to cause issues for
>>>>>>> others. The
>>>>>>> > 55,000 VT addresses that have been added using the Esri layer in
>>>>>>> the RapiD
>>>>>>> > editor include this "ref:vcgi:esiteid" key, and I've found it to
>>>>>>> be useful.
>>>>>>>
>>>>>>> A fair question, and it may be that the RapiD stuff is out of line.
>>>>>>>
>>>>>>> I don't think the foreign keys really hurt. I just think that the
>>>>>>> history is that they are less useful than everybody thinks they are
>>>>>>> going
>>>>>>> to be.
>>>>>>>
>>>>>>> >> Wow. Are you saying that apartment buildings have coordinates of
>>>>>>> entry
>>>>>>> >> doors within the building, or that they are artificially skewed
>>>>>>> to make
>>>>>>> >> rendering non-overlapping, or ? Surely Vermont has at least some
>>>>>>> >> multi-floor apartment buildings that have the same floor design
>>>>>>> and thus
>>>>>>> >> multiple units that actually do have the same horizontal
>>>>>>> coordinates.
>>>>>>> >
>>>>>>> > I've asked my contact at VCGI for clarification on how multi-tenant
>>>>>>> > buildings are addressed. From what I've seen, some multi-tenat
>>>>>>> buildings
>>>>>>> > just have one e911 address associated with them. I have seen other
>>>>>>> > buildings that have multiple addresses, but I've never seen them
>>>>>>> overlap.
>>>>>>> > I'll keep a close eye out for this and will see what VCGI has to
>>>>>>> say. I do
>>>>>>> > have the VT data in a postgis database, but don't have experience
>>>>>>> using the
>>>>>>> > GIS functions, so I'll try it out.
>>>>>>>
>>>>>>> Sounds good. There are hard questions about datasets and as you can
>>>>>>> see
>>>>>>> my bias is to dig in and address them.
>>>>>>> _______________________________________________
>>>>>>> Imports-us mailing list
>>>>>>> Imports-us at openstreetmap.org
>>>>>>> https://lists.openstreetmap.org/listinfo/imports-us
>>>>>>>
>>>>>> _______________________________________________
>>> Imports-us mailing list
>>> Imports-us at openstreetmap.org
>>> https://lists.openstreetmap.org/listinfo/imports-us
>>>
>> _______________________________________________
> Imports-us mailing list
> Imports-us at openstreetmap.org
> https://lists.openstreetmap.org/listinfo/imports-us
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/imports-us/attachments/20221007/119ce138/attachment-0001.htm>
More information about the Imports-us
mailing list