[Imports] import of data from LINZ

Robin Paulson robin.paulson at gmail.com
Mon Aug 27 11:31:39 BST 2012


On 21 August 2012 14:06, Paul Norman <penorman at mac.com> wrote:
>> Note those (the area previously cited) are from the old dataset, the new
>> ones are much cleaned up.
>
> How can we see these new areas to review them?

i.e. the upstream data provider now ships many fewer superflous data
fields, explore the "stats" pages here for an idea of what is available
to us:
  http://linz2osm.openstreetmap.org.nz/data_dict/layer/lake_poly/stats/

for example previous cartographic rendering specific fields for hints
when printing paper maps have been removed, but the majority of the
changes are things like the temperature field being renamed to "temperatur".
stupid little changes like that, and *not* entirely new tagging from
scratch since our extensive 2010-2011 tag-matching effort. It has been more
to do with proof-reading and double-checking than to do with new
problem solving.

as to public access to see how we're tagging those, we're working on it:
  https://github.com/opennewzealand/linz2osm/issues/9

and you can review our discussions for each layer in the "Show Check outs"
pages' "Notes" section. for example:

http://linz2osm.openstreetmap.org.nz/workslices/create/layer_in_dataset/9/

to get an idea of our process. More extensive discussion about complicated
layers can be found in the nzogis mailing list archives.

> The point is to make sure that each mapping from a LINZ tag to an OSM tag
> has been verified, not each individual object. For example, with the US NHD
> the feature code 46006 was imported as waterway=river. This was reasonable
> based on the written description but in practice >95% of these "rivers" were
> really waterway=stream.

at a big-picture level, one might question why osm makes the
distinction between a river and a stream, what does it actually mean,
or relate to? it's highly qualitative, interpretative,
culturally-constructed and difficult-to-use data. deciding whether
some flowing water is one or the other is difficult, and i'm not sure
it gets us anywhere that 'width=xxx' or 'flow_rate=yyy' wouldn't be
better for

this is a problem for our "rivers" too: one size fits all.
I don't see anything in the "Notes" for river_cl, but I'm sure we
discussed it on the list, check in the archives, early May 2010.

our current method is to fix it by hand using local knowledge at the
same time as fixing river direction; possibly automate using GIS
analysis; and finally use a Xapi download to find nearby description
tags containing the stream/river name, then checking by hand &/or with
local knowledge which nearby waterway it refers to.

http://wiki.openstreetmap.org/wiki/LINZ_revisit_list#Correct_Taggings
http://wiki.openstreetmap.org/wiki/Script_for_finding_nearby_nodes
http://wiki.openstreetmap.org/wiki/LINZ/Howto#Fixing_bulk_tagging_mistakes_after_upload

>> We know there is a general leaning towards hand edits are best and this
>> works well where there is a dense population base. It will be far easier
>> for us to engage the wider mapping community once we have a base layer
>> of decent data that requires smaller amounts of hand editing rather than
>> an empty match. We believe the trade off in this instance is worth the
>> small amount of incorrect data.
>
> Just to clarify, my comments were not about the value of imports for use in
> mapping remote regions, it was about the process of writing the conversions
> from data source attributes to OSM tags which experience has shown cannot be
> done accurately purely based on the descriptions from the data source.

we are applying local knowledge and reasonable estimates based upon
aerial photographs and any other information we can infer/find out. we
are using the (quite good) LINZ data definitions as a guide to their
intention, not as absolute Truth.

example: shingle used both for beaches and for rivers:
http://apps.linz.govt.nz/topo-data-dictionary/index.aspx?page=class-shingle_poly
   & our in-app tagging discussion:
http://linz2osm.openstreetmap.org.nz/workslices/create/layer_in_dataset/9/

>> On a related noted our road data will be coming from the NZ Open GPS
>> project which has taken the LINZ road data and added a lot of meta data,
>> removed paper roads and generally made it good enough for GPS units.
>
> Is this part of this import, or a new import that will be proposed at a
> later date?

part of the import, although due to the sensitivity and merging issues
we are taking special care and the extra community QA steps with it vs.
all the other layers.

>> We will look into how we tag attribution and source. A source tag is
>> important for us for the update process but we can move the attribution
>> somewhere else if we can automate its insertion so it doesn't
>> accidentally get left off.
>
> Although it is possible for tags to be accidentally left off of a changeset,
> attribution=* can be edited and the data source would have to be okay with
> attribution from the history in that case.

AFAIK they are aware of the problem, but respect our best effort of having
it there in version 1 of the feature edit.

> I have also seen the attribution tag inspire reluctance in editing by newer
> users who are not sure if they should edit the tag when combining multiple
> sources of information.

Good! If it stops new users from "correcting" good LINZ data with
crappily aligned Bing imagery (we've found numerous mistakes in new
zealand) it will make me very happy. source=LINZ is a mark of quality.
This has happened to me a number of times for areas around various
mappers that we know very well, and it is quite frustrating to have
good data replaced by bad by someone half a world away that thinks the
satellite imagery is somehow more correct than a high-res
'source=GPS'. </grumble>

>> It is important to emphasize that the LINZ:layer and LINZ:dataset tags
>> are what will make later data releases from the gov't able to be
>> incorporated (pre-per-feature ID tags), and bulk corrections to already
>> uploaded data possible.
>
> Are there features with the same OSM tags that are found in multiple LINZ
> layers? If so then I can see the need, but if not then the LINZ layer can be
> inferred by the tagging of the object.

yes there is, for example the expansion of the descriptive text layer and
building codes.
http://wiki.openstreetmap.org/wiki/LINZ_geo_name_matching
http://wiki.openstreetmap.org/wiki/Script_for_cleaning_up_the_descriptive_text_LINZ_layer

also, it makes later corrections and evolutions of tagging possible:
http://wiki.openstreetmap.org/wiki/LINZ/Howto#Fixing_bulk_tagging_mistakes_after_upload

>> > It looks like some display information from their database made it in.
>>
>> Yes, that was done on purpose in case someone wanted to use it when
>> cleaning up/merging the tags to see what/where offsets vs the nearby map
>> features and how important the thing was (size). Also if anyone wanted
>> to use it for cartography.
>>
>> Those fields are gone now in the new release.
>
> Good to hear. Can you post a new .osm file of an area with the new tagging?

for the new available data fields see:
http://linz2osm.openstreetmap.org.nz/data_dict/layer/descriptive_text/stats/

but that one (with all the rendering hints) is a really bad example,
since it relies on a post-processing script and as such is being put
off until the very end to be done by hand:
http://wiki.openstreetmap.org/wiki/Script_for_cleaning_up_the_descriptive_text_LINZ_layer

again, actual export of tagging rules from the app to is coming soon.
(issue #9 URL above)

>> > http://www.openstreetmap.org/browse/node/767114113 has tags which
>> > belong on the nearby waterway=river
>>
>> This is part of a post import cleanup process that will be under taken.
>> There is a description layer which labels some things like rivers (some
>> rivers also have tags directly). Post import we plan to have a few check
>> scripts that look for hotspots to review and hand edit. We expect the
>> whole import process to take months once we get it up and running but we
>> will get there.
>>
>> > http://maps.paulnorman.ca/imports/review/streamconverge.png is a point
>> > where three waterways converge on one spot. It appears that some
>> > waterways are reversed.
>>
>> Yes - We can't gaurentee the direction of rivers right now. LINZ are
>> working on fix this in the base data. One of the notes on importing
>> further rivers is to check direction and fix if we can work out which
>> way is downhill...
>
> Ah, I didn't realize it was an issue inherited from the source. There's not
> really a great way to handle directionality of waterways in OSM. The default

 we're treating them as the arrows in JOSM flow down to the sea.

> assumption is that they're directional but there's no widely accepted way to
> indicate the directionality is unknown or that it has no directionality.
> There's the directional=* proposal but that hasn't seen much usage.
>
> I'm going to have to tackle this issue with the US NHD data where some
> waterways are indicated as having no direction so I'd be interested in what
> you work out.

we are a very mountainous country and there are very few places where
there's a stagnant, reversing, or ambiguous river flow. Where they do
exist (e.g. central-western Stewart Island) we'll make out best guess
based on the available digital elevation models and local knowledge.

>> We have a post import task to review these. Interestingly the point
>> description layer has a tag called beach which will allow us to zoom on
>> quite a few of these areas.
>>
>> Also: The sat imagery is misleading, we believe they are tall (10m+)
>> dunes and not a classic beach front.
>
> I guess then you could tag them either way then, OSM tagging being what it
> is :)
>
>> > The display information on some nodes is the most serious issue, but
>> > it should be easy to fix.
>>
>> We will look at the tagging of the multi-ploygons and review the
>> attribution issues.
>>
>> > Overall it looks not bad, but I look forward to seeing an updated .osm
>> > file with these issues fixed.
>>
>> We will be doing some more targeted imports to check tags on the layers
>> over the next week. We will keep the group posted/
>
> You should not be doing any imports while you are still developing the

the vast majority of the tagging was developed in a formal way over
the course of a couple of years. The vast majority of the new data
effort is just remapping field names like "runway_surface" ->
"surface" for the upstream "runway" layer. It's just adjusting the
upstream side of the tagging rule to the simplified or abbreviated
name, the OSM side of the tagging rule is in the vast majority of the
cases staying the same.

> tagging. If you need to test scripts you can do it on the dev APIs, but the
> live API is not for testing imports.

we have actually been doing that where appropriate:
  http://wiki.openstreetmap.org/wiki/LINZ/Howto#General
  http://joerichards.dev.openstreetmap.org/index-new.html

but in general we are *not* uploading using scripts. we are uploading
and merging by hand using humans working in JOSM. Probably a better
way of saying it is that we are testing our now collaborative
management tool with these targeted desert island uploads, using
long-tested and matured tag-matching developed over many years.

-- 
OSM NZ mappers



More information about the Imports mailing list