[Imports-us] Fwd: Vermont, U.S. address import

Adam Franco adamfranco at gmail.com
Mon Oct 10 03:29:23 UTC 2022


Hi Jared, I've submitted PR #2
<https://github.com/JaredOSM/vermont-address-import/pull/2> in which I've
updated your initial generation script to take command line arguments so
that it could be run against arbitrary input files, then added a second
script `generate_all.php` which will run the script for all town input
files and write those back to the draft folder.

These changes should allow easy re-running of the script as tweaks are made
to allow easier tracking of changes to the output and checking for
unintended consequences of the tweaks.

I hope that over the next few days I will be able to more deeply review the
addresses for additional towns that already have significant address
coverage and look for further discrepancies with what is coming from E911.
I'm giving myself until Wednesday to provide some more feedback. :-)

On Sat, Oct 8, 2022 at 10:43 PM Jared <osm at wuntu.org> wrote:

> Greg, Your outline seems reasonable, but is outside the scope of what I'm
> looking to tackle at the moment.  Let me know if you think you'd be
> interested in working on a phase two where a more sophisticated automated
> tool is developed to deal with the more complex towns.
>
> At the moment, I'd like to move forward with my mostly manual import of
> the towns with less than 100 existing OSM addresses.  I've now created
> draft import files for 12 towns.  See data here:
> https://github.com/JaredOSM/vermont-address-import/tree/main/data_files_to_import/draft
> The script I have is doing a good job of expanding street names, and I
> believe my manual review process is working.
>
> Those of you that have provided feedback, or anyone else, please let me
> know if you have any remaining concerns with me proceeding with the project
> outlined here:
> https://wiki.openstreetmap.org/wiki/VCGI_E911_address_points_import
>
> Thanks,
> Jared
>
>
> On Sat, Oct 8, 2022 at 9:16 AM Greg Troxel <gdt at lexort.com> wrote:
>
>>
>> Jared <osm at wuntu.org> writes:
>>
>> > Can you walk me through a real example so I can understand how you would
>> > identify existing addresses?
>> >
>> > Let's take Addison, Vermont for example.
>> >
>> > The VCGI e911 dataset has 987 address points in Addison.  Here's the
>> data
>> > file:
>> >
>> https://github.com/JaredOSM/vermont-address-import/blob/main/town_e911_address_points/e911_address_points_addison.geojson
>> >
>> > When I run an overpass query for all elements in Addison that have a
>> > housenumber or street: https://overpass-turbo.eu/s/1mxX
>> > I find that there are already a total of 142 nodes and ways with address
>> > information OSM.
>> >
>> > By looking at the overpass results, I can immediately see that 55 of the
>> > existing OSM elements have a "ref:vcgi:esiteid" Key/Value pair.  Without
>> > any further queries, I have a high level of confidence that I can remove
>> > all 55 address points from my import file, as they are not even
>> > worth considering for an automated import.  This seems like a safe and
>> > efficient way of eliminating the chance of importing duplicate data.
>> > Obviously the other data points need to be evaluated, but why not remove
>> > the 55 for which I have high confidence?
>>
>> Were I doing this, I'd want to take each VCGI datapoint and sort it
>> into one of:
>>
>>   - address exists in OSM, all VCGI address fields are present and match,
>> and location matches
>>   - address exists in OSM, not all VCGI address fields are present, and
>> location matches
>>   - address exists in OSM, some VCGI fields do not match and location
>> matches
>>   - address exists in OSM, location does not match (>= 5m?)
>>   - address does not exist in OSM, but a previous VCGI import added it,
>>     and then it was manually deleted (must not be re-added by an
>>     automated process! **)
>>   - address does not exist in OSM, and no OSM address point is within 10m
>>   - address does not exist in OSM, but there is an OSM address point
>>     within 10m (this is "OSM and VCGI disagree on the address of a
>>     location", or it might be "OSM has building and VCGI also has unit
>>     addresses, or it might be something we don't understand yet)
>>
>> at least.  This needs looking at nodes and ways for address tags, and
>> probably the distances are not quite right, and may need to be bigger in
>> rural areas and smaller in more urban places.  Then, look at those bins
>> and see what's in them figure out if they are correctly sorted, and
>> refine the rules and perhaps the categories.  This is why I keep talking
>> about programs, not using josm plugins.  In my view, the processing
>> method should not be constrained by what some existing tools do; it
>> needs to adapt to the realities of the data.
>>
>> For the above categories, some lead to "no action".  Some lead to "add
>> fields to existing object".  Some lead to "generate worklist for field
>> verification", and perhaps to "report bug to VCGI".
>>
>> In the above, I'm not using a foreign key in OSM.  If the data is
>> present and matchable, great.  If it doesn't match (but should have)
>> you'll pick it up as "address point in OSM in same place but different
>> content".  But you need to pick that up with address points that
>> *weren't* imported, so the foreign key really doesn't help simplify the
>> processing.   And for things that really were imported before, the
>> matching will succeed easily.
>>
>> ** For this, the import needs to either search history to identify
>>    things added by a previous import changeset and then removed, or to
>>    keep a record of what was imported, and to skip processing in a new
>>    import of records that were prevously imported -- whether or not they
>>    are still present.
>>
>> Also, it would be great to be able to identify things that were imported
>> but are no longer in the VCGI dataset, and sort those into substantially
>> manually modified vs not.
>>
>> I agree that it's reasonable to first find "address does not exist in
>> OSM and there is no nearby object with any address in OSM" and restrict
>> scope to that as long as there is near zero of "re-adding data that hand
>> mappers have found to be incorrect and removed".
>>
>> I see doing this as looping over the import data and doing a db query
>> for street name and number, and another for address objects near the
>> coordinates.
>>
>> I think this first requires data cleaning to expand acronyms and
>> transform street names to OSM capitalization etc.  That can be done as a
>> first-step match going from the set of street names in VCGI and in OSM
>> for a given town.  This all assumes municipal boundaries already in
>> place, or that all address points in OSM (that are in VT) have town
>> names.  Modulo issues of addresses that are not in towns of course, if
>> that is possible (it's not in MA, but I know we're odd).
>>
> _______________________________________________
> Imports-us mailing list
> Imports-us at openstreetmap.org
> https://lists.openstreetmap.org/listinfo/imports-us
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/imports-us/attachments/20221009/d4550dc4/attachment.htm>


More information about the Imports-us mailing list