[Imports] Address import from city government source

Greg Troxel gdt at lexort.com
Thu Aug 4 10:08:47 UTC 2022


Joe Rhodes <joe.rhodes at castlepinesco.gov> writes:

> The motive for this task is a practical one: Castle Pines is expected
> to more than double in population over the next several years, and we
> wish to establish processes to keep addresses up to date for the
> benefit of our residents (ensuring package delivery, navigation in
> consumer GIS apps, etc.). Once we get addresses sorted, we wish to
> evaluate the appropriateness and feasibility of updating other OSM
> features in Castle Pines, including parks, land use/land cover, and
> other features.

To first order, you should realize that "normal mapping" is the primary
path in OSM, and is distinct from "importing".  I have kept those sorts
of things (except broad landcover, which is messy in osm) up to date for
my town.  That involves actually looking at the situation in the ground,
GPS tracks on trails in forest (that you can't even see a hint of in
aerials), businesses, etc.    If you are adding a park one at a time in
JOSM and actually know that it exists, that's editing not importing, and
as long as you are using license-acceptable sources (CC-BY and CC-BY-SA
are not acceptable).

(MA content/refs is only because that's what I really know.)

BTW if some big company (perhaps google) is using your county data and
it is CC-BY-SA, then they are highly likely not following that license.
The government should be even-handed and if they are going to grant
terms to for-profit companies that are essentially "PD, attribution
requested", they should offer those terms to the OSM and the general
public.  See MassGIS's terms for an example of this done in way that
works well in practice and is honorable from a good-government point of
view.

Perhaps, the county picked cc-by-sa because they guessed that is what
people wanted.

There are lcoal mappers, so your first path should be to publish the
data under an acceptable license.  That lets other OSM people use it in
hand editing and to contemplate imports.  For example in MA we have
configured the tax parcels layer from MassGIS for OSM editing, and
people can see boundary lines and addresses.  And, some people use other
MassGIS data for reference (good license and easy shapefile download).

> In my estimation, using address points straight from Douglas County
> will probably be the best path. Their data is marked with a Creative
> Commons/CC-BY-SA 4.0 license, but it sounds like it’s preferable or
> necessary to get explicit permission from the county, which I can
> start on. (Regarding Mike’s comment on disclaimer vs. license on the
> city’s open data site, he is correct - that’s a quirk of ArcGIS Open
> Data; it feeds the hosted item’s “Terms of Use” to the license
> field. I’m no legal expert either, but I don’t think ‘terms of use’ ==
> ‘license’.)

The point is copyright law.  First, it's really unclear if copyright law
even applies, because an address data set has a definition of correct
and therefore is arguably not a creative work.  But OSM doctrine is to
ensure that we have permission under copyright law as if it applies.
"terms of use" is more or less a red herring relative to copyright, or
at best muddled.  What is needed is permission to copy, modify and
distribute modified copies, because under copyright law those rights are
reserved to the copyright holder.

Note that this means random OSM users will change data after it has been
imported if they think it is wrong (and there are surely errors).  So
what will be in OSM will diverge, and you'll have to be ok with that.

In any particular case, if you have a disagreement you can talk to the
person who edited, but that's peer to peer as an OSM editor, to
establish "is this fact true on the ground", with zero part of "you
changed my import and my data is authoritative".  Imports absolutely
cannot overwrite existing data -- but hand review to establish
correctness can be done based on automated diffing as a list of things
to look at.

(I have in fact found errors from MassGIS data and fixed them in OSM,
albeit very few, as in 1 street name and 1 phantom street in my town of
7000 people.)

> Regarding the boundary: If acceptable, we can also take this directly
> from the county (layer:
> municipality<https://gis-dougco.opendata.arcgis.com/datasets/municipality/explore?location=39.458592%2C-104.866197%2C13.30>),
> so there’s no need to dissolve parcels. To Greg’s comments: The half
> that is present is correct, but nothing west of I-25 shows as part of
> the Castle Pines administrative boundary. I don’t yet know enough
> about OSM to say whether there may be a difference between what’s
> rendered vs the database, but it certainly appears that half is

Before you can really contemplate imports you need to learn enough so
that you can dig in to the database and answer these questions and
really have enough expertise to guide new people who are confused.  In
OSM, importing is only for experts -- but we are happy to help anyone
who wants along the path to being an expert.

This is why I keep saying "ensure that the data is published along with
a license that is acceptable, where the license is visible to any random
getting the data".   If you meet that interface, it is highly likely
that somebody who is an OSM expert in CO will add it.  It will take them
maybe an hour if they are super careful, maybe less.

> missing. So I don’t wish to correct any existing geometry, or posit
> the accuracy of any dataset’s features over any other, simply to get
> the western half of the city’s boundary added.

Sure, but what's in the db vs what's rendered is quite complicated.

> Further to the above, and perhaps because of the above, all of the
> addresses I’ve spot-checked so far west of I-25 show as being in
> unincorporated Douglas County. So once a conflation process is sorted,
> I’ll also want to add the correct town name to the many address points
> that do exist yet lack the correct town designation.

Have you looked at the address points in the db?  Often there is no
town and that is inferred when used from enclosing boundaries.

(I live in MA, so I don't really know much about unincorporated areas
(really, we don't have any).)

> Lastly, to the question of conflation: It seems to me that scripting a
> comparison of addresses proposed to be imported against the existing
> database would not be an extraordinarily heavy lift. We’re fortunate
> that our street names are relatively unique (no 1st, Main, Lincoln,
> etc.), so I’m hopeful that I won’t find many matches outside the
> immediate area.

Yes, it should be doable fairly easily, and you have to check that
geometries are close too.  While it's not formally a requirement, this
shoudl be done with code that is published and usable by other OSM
people, which means 1) open source license on the code and 2) no ESRI
dependencies, because the rest of OSM a) does not have ESRI licenses and
b) mostly does not know how, partly because they don't have licenses,
partly because they don't want to spedn time learning something they
won't be able to use and c) (fuzzy, not so strongly) they tend to want
to avoid proprietary software.  So that leads to
python/gdal/postgis/qgis.

> Thanks again for the insight and guidance. To my understanding, the next steps are:
>
>   1.  Get explicit written permission from Douglas County. If the
> Creative Commons license is sufficient, I’ll skip this, but I
> certainly don’t mind reaching out to the county if necessary or
> preferable.

Nope, CC-BY is not ok, because it's incompatible with the OSM license.
When you download the entire db (planet file), it can't possibly
attribute every contributor, of which there are millions.

>   2.  Reach out to local mappers for input
>   3.  Familiarize myself with OSM tools and retrieve an OSM address list for comparison

Yes, and this means 1) sign up for a mapper account 2) learn JOSM and do
some hand mapping of a few objects.  But you can also grab data and look
at it in ESRI or qgis.

>   4.  Document a conflation process and write a script to compare
> county addresses to the existing OSM list, then manually review
> matches outside the immediate area, if any

Yes, and this is a fair bit of work.  It would be really great if there
were software that could do this in general, because you aren't the only
one with address data!   I see the pipeline as:

  transform various formats (surely you have a database table with a
  bunch of fields, and your fields are probably not quite the same as
  some other people's data, even though they are similar because they
  are trying to represent similar semantics) to OSM format, as points.
  This regularizes the data syntactically and semantically.

  Create conflation code that takes an input file and the OSM database
  and produces output datasets for:
    points not in OSM, perhaps split:
      object in OSM that is a good candidate to get the address*n
      no object in OSM => bare point
    points in OSM but different location
    points in OSM with substantially different tags
    points in OSM with close tags
    points in OSM exactly
    
* In MA, addresses are generally put on the largest building on a
  parcel, which is usually right.  The state's focus is on E-911 and
  matching address to "where should PD/FD show up when called.

> Let me know if I’m missing or misunderstanding anything. City of
> Castle Pines is a member of DRCOG, so let me know how that might
> impact the process.

It might impact permissions on your end -- perhaps you can get the
overall license fixed -- but OSM deals with individuals.  You have no
special standing to import or edit because of your job.  That doesn't
mean we are at all hostile to people that work for the government, just
that the considerations are the same as if say I was going to import
address points for my town from MassGIS (I am not an employee of my town
or of MassGIS).

Greg
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 194 bytes
Desc: not available
URL: <http://lists.openstreetmap.org/pipermail/imports/attachments/20220804/210269e7/attachment-0001.sig>


More information about the Imports mailing list