[Imports-us] Fwd: Vermont, U.S. address import
Greg Troxel
gdt at lexort.com
Sat Oct 8 13:16:02 UTC 2022
Jared <osm at wuntu.org> writes:
> Can you walk me through a real example so I can understand how you would
> identify existing addresses?
>
> Let's take Addison, Vermont for example.
>
> The VCGI e911 dataset has 987 address points in Addison. Here's the data
> file:
> https://github.com/JaredOSM/vermont-address-import/blob/main/town_e911_address_points/e911_address_points_addison.geojson
>
> When I run an overpass query for all elements in Addison that have a
> housenumber or street: https://overpass-turbo.eu/s/1mxX
> I find that there are already a total of 142 nodes and ways with address
> information OSM.
>
> By looking at the overpass results, I can immediately see that 55 of the
> existing OSM elements have a "ref:vcgi:esiteid" Key/Value pair. Without
> any further queries, I have a high level of confidence that I can remove
> all 55 address points from my import file, as they are not even
> worth considering for an automated import. This seems like a safe and
> efficient way of eliminating the chance of importing duplicate data.
> Obviously the other data points need to be evaluated, but why not remove
> the 55 for which I have high confidence?
Were I doing this, I'd want to take each VCGI datapoint and sort it
into one of:
- address exists in OSM, all VCGI address fields are present and match, and location matches
- address exists in OSM, not all VCGI address fields are present, and location matches
- address exists in OSM, some VCGI fields do not match and location matches
- address exists in OSM, location does not match (>= 5m?)
- address does not exist in OSM, but a previous VCGI import added it,
and then it was manually deleted (must not be re-added by an
automated process! **)
- address does not exist in OSM, and no OSM address point is within 10m
- address does not exist in OSM, but there is an OSM address point
within 10m (this is "OSM and VCGI disagree on the address of a
location", or it might be "OSM has building and VCGI also has unit
addresses, or it might be something we don't understand yet)
at least. This needs looking at nodes and ways for address tags, and
probably the distances are not quite right, and may need to be bigger in
rural areas and smaller in more urban places. Then, look at those bins
and see what's in them figure out if they are correctly sorted, and
refine the rules and perhaps the categories. This is why I keep talking
about programs, not using josm plugins. In my view, the processing
method should not be constrained by what some existing tools do; it
needs to adapt to the realities of the data.
For the above categories, some lead to "no action". Some lead to "add
fields to existing object". Some lead to "generate worklist for field
verification", and perhaps to "report bug to VCGI".
In the above, I'm not using a foreign key in OSM. If the data is
present and matchable, great. If it doesn't match (but should have)
you'll pick it up as "address point in OSM in same place but different
content". But you need to pick that up with address points that
*weren't* imported, so the foreign key really doesn't help simplify the
processing. And for things that really were imported before, the
matching will succeed easily.
** For this, the import needs to either search history to identify
things added by a previous import changeset and then removed, or to
keep a record of what was imported, and to skip processing in a new
import of records that were prevously imported -- whether or not they
are still present.
Also, it would be great to be able to identify things that were imported
but are no longer in the VCGI dataset, and sort those into substantially
manually modified vs not.
I agree that it's reasonable to first find "address does not exist in
OSM and there is no nearby object with any address in OSM" and restrict
scope to that as long as there is near zero of "re-adding data that hand
mappers have found to be incorrect and removed".
I see doing this as looping over the import data and doing a db query
for street name and number, and another for address objects near the
coordinates.
I think this first requires data cleaning to expand acronyms and
transform street names to OSM capitalization etc. That can be done as a
first-step match going from the set of street names in VCGI and in OSM
for a given town. This all assumes municipal boundaries already in
place, or that all address points in OSM (that are in VT) have town
names. Modulo issues of addresses that are not in towns of course, if
that is possible (it's not in MA, but I know we're odd).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 194 bytes
Desc: not available
URL: <http://lists.openstreetmap.org/pipermail/imports-us/attachments/20221008/abe0712f/attachment.sig>
More information about the Imports-us
mailing list