[Imports-us] Fwd: Vermont, U.S. address import

Jared osm at wuntu.org
Fri Oct 7 02:28:12 UTC 2022


On Thu, Oct 6, 2022 at 8:13 AM Greg Troxel <gdt at lexort.com> wrote:

>
> > Concern 2: Should the "ref:vcgi:esiteid" tag be included or not?
> > While not a silver bullet, I find that having this unique key that
> connects
> > a node back to the origin database helpful for building confidence when
> > evaluating whether an address exists in OSM or not.  If I find a node in
> > OSM that has this unique esiteid, I can be confident that it already
> > exists, and I can remove it from my list of items that need manual
> > consideration.  I personally find it helpful, and don't find it
> obtrusive,
> > but if there are prior discussions that you can point me to, I'd be
> > interested in learning more.
>
> I don't have handy links, but my impression from reading the import list
> for years is that it is broadly agreed that foreign keys don't belong.
> When you are doing a new import/conflation (say in 2 years when VT
> releases an update), you have to actually conflate and check.  Just
> because something has a key doesn't mean you can overwrite it.  Some
> human may have modified the data to fix it.  The only automatic
> overwrite that's ok is to check that the address data on a node matches
> exactly the data that was imported, and that the import source is now
> different.
>
> You are going to have to deal witha matching addresses between import
> source and OSM programmatically like in #1 above, once you move beyond
> non-addressed towns.  Once you do that, the ref won't help, as it won't
> be 100% reliable.  Therefore it's noise.
>

I was thinking of using the foreign key for a different use case.  I agree
that relying on this key for *overwriting* OSM data does not seem safe.
The scenario I'm thinking about is for NEW addresses that are added to the
VCGI dataset.  To determine if a NEW VCGI e911 address exists in OSM, the
"ref:vcgi:esiteid" tag would seem to be very helpful.  If an address in OSM
already has that unique esiteid key, then we can be confident that it
should be skipped.  If the esiteid does not exist in OSM, then other
signals should be evaluated (housenumber, streetname, lat/long, etc., but
those can be less precise due to misspellings or slightly different
coordinates.

I'd like to hear the negative impact a foreign key causes.  There are other
similar foreign keys (eg. wikidata, wikipedia) and I've never found them to
be detrimental to my work, but don't want to cause issues for others.  The
55,000 VT addresses that have been added using the Esri layer in the RapiD
editor include this "ref:vcgi:esiteid" key, and I've found it to be useful.

> Concern 3: Should the "source:VCGI/E911_address_points" be included on a
> > node?  Or only in a changeset comment?
> > If you have links to further docs/discussions about this, I'd like to
> make
> > sure I understand the current best practices. I agree that adding a
> source
> > to the changeset tag makes more sense.  I don't fully understand
> > the implications for future updates to imported nodes.
> > I have updated the import proposal wiki page by removing the source tag
> > from the individual node, and adding it to the changeset tag.
>
> This seems really well established on this list and I don't know where
> it's writen down.  Having vast numbers of source keys on points is just
> noise, and they won't get reliably removed when the data is edited.  And
> there isn't anything really useful about it.  Future conflation needs to
> look at history to be sure if the data remains exactly what was
> imported; the source tag doesn't prove that the current data matches.
> (e.g. what if I change 5 to 7 on a house number because the import was
> wrong, and don't remove the source tag because a) I don't really
> understand it and b) the rest of the fields still came from there.)  And
> if you are just conflating as in 'find addresses in dataset that aren't
> in osm' then it doesn't help either.
>
> Anyone who thinks consensus includes source tags on nodes should speak
> up.  I posit that almost no one who has been on the import list for a
> year thinks that.
>

I've updated my script so the source tag is not included with each node. I
also updated the proposal page to indicate that the source tag will be
included with the changeset.


> > Concern 4: The "Conflation" section of the proposal is vague, and makes
> it
> > sound like the project could morph in potentially dangerous ways without
> > approval.
> > I've updated the section to read:
> > **
> > "For the scope of this particular import project, conflation will be
> > avoided/skipped. Any preexisting addresses will be left as-is. New
> > addresses will be imported as standalone nodes (not conflated with
> existing
> > building outlines).
> >
> > If addresses need to be conflated, they will be dealt with in an update
> to
> > this project, or as part of a separate project, either of which will get
> > reviewed and approved."
> > **
> > Let me know if it still needs further clarification. Basically, my
> > philosophy is to deal with the easy parts now, and anything that is more
> > complicated will be dealt with in a future project.
>
> I don't think it's ok for an import to add features tha are duplicates
> of existing data.  You agree as you talk about towns with less than 100.
> Also conflation is itself a funny term as there are two separate issues:
>
>   1) Find subset of source dataset that is not already in OSM.  Generate
>   OSM-format file that would be uploaded.
>
>   1A) Like (1), but find objects like house outlines and add tags to
>   those instead.
>
>   2) For items in the source dataset that are already in OSM, figure out
>   what to do.   There can be a more complicated  merge where you add
>   some fields to partial matches.
>
>
> I think you are saying: "importing will be nodes only, avoiding building
> conflation.  In this stage, addresses are only imported for towns with
> <100 existing address points, and those will be manually removed from
> the upload file.  Thus no duplicate data will be introduced."
>
> If so that's ok, but i think it's good to say things extra clearly to
> make it clear that the things which shouldn't happen won't.


Thanks. I've updated the wording of the conflation section of the proposal
using your suggested language.


>  > Concern 5: Have you evaluated whether there are points in the database
> with

> the
> > same location, what you are going about that, and why?
> > I have not done an exhaustive search, but in the 55,000ish addresses I've
> > added manually so far, I don't recall this being an issue with the VCGI
> > data.  But, I've primarily focussed my efforts on rural and
> > residential areas where the vast majority of addresses are for single
> > family dwellings, or occasionally a duplex with two distinct addresses.
> Let
> > me know if you have suggestions about how to identify these.  Is
> searching
> > for points that share the same exact lat/long adequate?  Are you aware
> of a
> > script that already does this?
>
> Basically you should load this into postgis and there are queries to
> write to find points that are very close to each other.  something like
>
>   select  a, b where ST_Distance(a,b) < 2
>
> to find points within 2m of each other.
>
> In MA, we found tons of stacked points for multi-family dwellings.  I
> would expect the same (much fewer in number in VT I agree) in other
> states.
>
> One thing that could be done is to combine to one OSM object that has
>
> unit=1;2;3;4
>
> and I don't remember the addressing scheme consensus on that.
> My point really is that you should be super clear on whether this is an
> issue.  I think it's fine to skip importing points that are hard; your
> low-hanging fruit idea is fine, and you'll learn a lot and can then do a
> 2nd round.
>

Follow up email from Greg:

> Wow.  Are you saying that apartment buildings have coordinates of entry
> doors within the building, or that they are artificially skewed to make
> rendering non-overlapping, or ?  Surely Vermont has at least some
> multi-floor apartment buildings that have the same floor design and thus
> multiple units that actually do have the same horizontal coordinates.
>

I've asked my contact at VCGI for clarification on how multi-tenant
buildings are addressed.  From what I've seen, some multi-tenat buildings
just have one e911 address associated with them.  I have seen other
buildings that have multiple addresses, but I've never seen them overlap.
I'll keep a close eye out for this and will see what VCGI has to say.  I do
have the VT data in a postgis database, but don't have experience using the
GIS functions, so I'll try it out.

Thanks again for the feedback.

-jared
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/imports-us/attachments/20221006/46f1f126/attachment-0001.htm>
-------------- next part --------------
-----BEGIN PGP SIGNATURE-----

iF0EARECAB0WIQS7wyAjWilQwVHG9Vsf2nroCY7WDgUCYz7GUAAKCRAf2nroCY7W
DlX/AJ9V94Io1k1GZAMC9SuKYgTQc2Fn2QCfV5ahfln94nB6vmouidp0wtdDqlU=
=E0ky
-----END PGP SIGNATURE-----


More information about the Imports-us mailing list