[Imports] City of Seattle imports
Paul Norman
penorman at mac.com
Thu Dec 6 13:04:41 UTC 2012
Apologies for the length, but there are quite a few points to "address,"
some of a specific nature and others more general. Because I'm replying
to points across several messages and the formatting in this thread has
become screwed up I'll be reformatting messages and re-ordering them so
that they make more sense. Imports, as you can probably tell, are an
area I care about.
Full disclosure: I maintain ogr2osm, the software used for converting
geometries in the proposed import. See https://github.com/pnorman/ogr2osm
for more information.
While I support address imports in principle it is important that they
are done right. Back in 2010 I did an address import and although on
the whole it was successful I learned some lessons.
Jeff Meyer @
http://lists.osm.org/pipermail/imports/2012-December/001602.html
> From this page: http://wiki.osm.org/wiki/Import/Guidelines#A_checklist,
> can you advise as to the checklist steps we are not following, or which
steps
> should be added to this checklist?
http://wiki.osm.org/wiki/Import/Guidelines#Discuss_import_with_community
requires that consultation be done with imports@ and the appropriate local
communities. This certainly includes talk-us at .
Jeff Meyer @
http://lists.osm.org/pipermail/imports/2012-December/001598.html
> The current plan is to focus on addresses and building outlines.
> Sources are currently suggested to be tagged as:
> source:addr=data.seattle.gov
> source:path=data.seattle.gov
> so as to accomodate other sourcing information for those points and ways,
as
> appropriate.
The value of source=* tags on objects is debatable. Source tags on
changesets are a very good idea but it's not clear if they're a good
idea for objects. There are arguments either way. I am working on an as
yet unproposed address import and in the initial version I will be
proposing I will not be adding source tags to objects. If after
consideration you do decide that source tags are worthwhile I would
recommend just source=data.seattle.gov. This corresponds with the
general practices for the use of source tags. Remember that source tags
exist for the convenience of mappers and if they become inconvenient
they are not worthwhile. Source tags that are long and cryptic do not
help mappers. Some imports have used source tags where it necessary to
look up the exact value of the source tag because it was so long and
complicated.
I quickly found that it wasn't clear what to do with the source tags
when editing the addresses, primarily to merge with buildings or POIs.
If it wasn't clear to me, the importer, I'm pretty sure it was unclear
to other editors.
Jeff Meyer @
http://lists.osm.org/pipermail/imports/2012-December/001602.html
> > How will you handle object conflation?
> Manually and methodically.
Although not a trivial problem there is work underway on code that will
handle the address-address conflation
(https://github.com/pnorman/addressmerge).
Address-POI and address-building conflation remains a purely manual job.
It shouldn't be too hard to merge addresses with buildings they are within
when the building has only one address within in and the building does
not itself have an address. Addresses placed by building doors outside
the building itself add complications but I expect they are solvable.
Having said that it shouldn't be too hard, it's not trivial.
Jeff Meyer @
http://lists.osm.org/pipermail/imports/2012-December/001602.html
> > Where is the source of your transformation scripts?
> From the email above: "some translation instructions Cliff has put
together."
> We will include the specific translation code either at a github page or
on the
> http://wiki.osm.org/wiki/Seattle page.
> > Where are the specific data files you're transforming?
> They are at data.seattle.gov & we will provide links to the sources.
> We will also consider posting separate snapshots of these source
> datafiles if we can figure out where to host them.
To be able to sensibly comment on the tagging and data quality we really
need a sample .osm showing what the data is like. One option for hosting
it is an account on the dev server. See
http://wiki.osm.org/wiki/Dev_Server_Account
for more information on this. If this is a problem you could email me a
file and I could host it on one of my servers.
Because I maintain ogr2osm I'm comfortable reading very complex
translations and determining the results without actually running the
code. Most people aren't and a .osm file is a good way for people to
access the tagging and data quality. Another option for documenting
tagging is an appropriate page on the wiki. When I was working on the
Alaska county import I documented the tagging I would be using at
http://wiki.osm.org/wiki/Alaska/TIGER_Counties#Tagging before I had
written the translation file. The correct tagging for the county data
was pretty obvious. It also gave me a chance to document the changeset
tagging.
Jeff Meyer @
http://lists.osm.org/pipermail/imports/2012-December/001606.html
> We identified these files and developed preliminary scripts a couple days
ago.
> Here are a couple - these may not be the only sources considered for
import:
> https://data.seattle.gov/dataset/Master-Address-File/3vsa-a788
> https://data.seattle.gov/dataset/Street-Network-Database/afip-2mzr
> https://data.seattle.gov/dataset/2009-Building-Outlines/y7u8-vad7
I would recommend holding off on the streets data and restricting this to
addresses and building outlines. Virtually every aspect of imports
involving street data is significantly harder than other imports
involving only buildings and addresses. I strongly recommend you start
with this data as you so you gain some experience first. Even writing
a good streets data translation is generally harder.
When you've finished with the addresses and buildings you could then
propose a new import for the streets. We are also likely to have better
tools for dealing with the streets data in the medium term.
Jeff Meyer @
http://lists.osm.org/pipermail/imports/2012-December/001606.html
> I'm not sure why you are asking if I'll be doing this myself. This
> email thread contains several references to people who will be
> assisting. Cliff Snow and others attending the meeting to discuss this
> import. We hope to have a team of local mappers focus on imports where
> they have local expertise.
CanVec, French cadastre and other imports where the data is made available
to multiple people (e.g. through a website) show that without a QA process
the quality of imports varies drastically from person to person and that
you'll see some real bad imports. That's not to say that it can't work and
in many ways it's a preferable process but one bad importer can easily
create a mess in minutes that takes others ages to clean up. One way to
mitigate this is to do as much post-processing as possible before releasing
the files. In the context of address this would mean removing the addresses
that duplicate existing OSM data or for buildings removing buildings that
intersect existing buildings in OSM.
As the word count is telling me this message is over 1000 words I will
leave aside some concerns I have with updating and conflation for another
message later.
More information about the Imports
mailing list