[Talk-us] Imports information on the wiki

Thu Jan 5 20:02:13 GMT 2012

On Tue, Jan 3, 2012 at 5:59 PM, the Old Topo Depot
<oldtopos at novacell.com> wrote:
>>
>> Good point. Fixing it now is not a permanent solution. We will have to
>> keep monitoring. Which brings me to the question of responsibility: to
>> what extent do we - as the OSM US community - want or need to keep the
>> data consistent by running monitoring bots? I believe that a certain
>> degree of inconsistency is to be expected from a crowdsourced dataset
>> and we should be very careful antagonizing mappers with data
>> monitoring bots.
>> --
>
>
> Certainly inconsistency appears in any large dataset that's touched by a
> large number of editors.  I don't see why expecting contributors to try to
> comply with reasonable data entry/quality guidelines is unreasonable.  JOSM
> has a suite of validity checkers to try to apply constancy guidelines to
> new/edited data.  Why might running clean-up bots antagonize mappers
> (particularly if their updates can be filtered from edit lists ;-)) any more
> than validation checks before updates ?

I don't think it's unreasonable, but I do believe it's unrealistic.
There being very few technical restrictions on tagging means that
there will always be people who won't adhere to guidelines, either
because they think they know better or because they don't know about
the guidelines. The validity checkers in JOSM are great, but
intimidating for new mappers. Even for me with 5 years of mapping
experience, it's not always clear how to fix the validation check
errors.

I think that antagonizing may occur when human mappers feel that their
hard manual work is being reverted / corrected / changed by an
invisible hand that they cannot control. Most likely to happen with
novice mappers, thinking they are joining a human collaboration
effort. Which OSM is, and for better or worse should remain to be.
That's why I think every decision to let a bot loose on the data
should be considered extremely carefully.

That said, this would be one of those cases where we should do it.

> Very high quality map renderings depend upon some sort of reasonable
> consistency across feature types, not only to minimize the complexity of
> style sheets but also to guarantee that a river in New York renders like a
> river in Alaska.

They may, for traditional geographic data. For OSM however, you will
need to consider the crowdsourced nature, the open data model, and the
resulting heterogeneity of the data. That goes for map rendering, but
also for routing, GIS analysis tasks - any usage of OSM data really.

>>  Nominatum should still continue to handle both cases.  Although it has
>> been
>> said that map renderering is the place to abbreviate the full name, no map
>> renderer currently does this as far as I know.  If there is ever a custom
>> US
>> style OSM renderer, that would be a logical feature.
>
> IMHO, renderers render, ie transform vector/image data into map image
> pixels.    If a use case exists for abbreviated and long street name
> components then the data should be stored in the underlying DB to enable
> transforms and validation to run off-line.  Another possibility is a
> modified data store optimized for rendering (meta-tile sized data chunks or
> cached labeling performed using very sophisticated placement algorithms, as
> a couple examples) that the renderers use.

The discussion kind of veers off into the interesting direction of
where - if at all - consistency checks belong in the OSM
infrastructure. A few options come to mind:
* The data model itself
* The input interface (API / editors)
* The database
* The output interface (planet extracts / API / renderer / nomintatim)
* Outside of the system (third party value added products / individual
consumers)

I lean strongly toward the last option. OSM should continue to rely on
its informal data model as reflected by the map features page and the
tag voting and supporting discussion on the mailing lists. The
openness has allowed and continues to allow for the rich, organically
growing data that we have now. I would hate to see mapper's creativity
constrained by too many consistency checks.

Getting back to why I started this thread, this particular case is an
inconsistency created by a broken off automated attribute change. It
bugged me for that reason and because the process was never
documented, which is why I started this US imports and automated edits
page. I think it will help newcomers and people external to the
project use the data on a US scale, be it for map rendering or any
other purpose. What are your thoughts on that initiative?
-- 
martijn van exel
geospatial omnivore
1109 1st ave #2
salt lake city, ut 84103
801-550-5815
http://oegeo.wordpress.com