[Talk-us] A Friendly Guide to 'Bots and Imports

Greg Troxel gdt at ir.bbn.com
Sat Aug 7 01:19:31 BST 2010

Serge Wroclawski <emacsen at gmail.com> writes:

> Moving away from discussions of specific imports, I'd like to explore
> what people think about a few areas of this discussion:
> 1) When someone says "I want to import X", what should our first response be?

I think your reaction to point out the danger is fair.  But, living in
an area with a lot of high-quality data that has been imported rather
well, I'm not anti-import.  But I am in the "imports should be
exceedingly well though out" camp.

> 2) When someone points out a widespread problem (such as the Salt Lake
> City addresses), how do we want to proceed?

Some things need automated edits to fix.  I would like to see safe
frameworks for this in osm svn/git/whatever, and more or less require
that the code to be run for fixups be stored as part of the coummunity
history.  It's clear that things need to be fixed, and the challenge is
to make the fixes be net positive.

> 3) Is it better to discourage bots and imports (as we do currently) or
> better to heavily document bots and set up standardized methods? (and
> do people think those methods will be used?)

I think most people doing automated imports are doing so because they
want to fix something that's broken, and most are patient.  If we
provide skeleton code and especially a way to see how the fix works
before it's really committed, I think most people would be cooperative.

In my case, I've thought about several automated edits (and done zero):

  duplicate nodes at town boundaries in roads due to massgis highway
  layer.  I wrote on talk-us about what I think ought to be done, in
  terms of outlining a precondition for "two nodes on same place,
  massgis tags, each the end node in a highway way with massgis tags".
  Somehow, most of this got fixed, and I don't know if it was part of
  the general de-dupe rampage or someone doing a more targetted edit.
  But as far as I can tell it was done right, and a good outcome.

  In MA, landuse=reservoir is on lots that are really "reservoir
  protection".  They render blue, and I think they should be retagged.
  Or maybe mapnik and the tagging rules fixed.  So I haven't gotten
  around to this - i have gotten the clue to tread lightly and I've been

  fuzzy matching on GNIS vs massgis points, and merging them, taking
  massgis locations, in cases where no human has edited the GNIS points.

Bots are another story; that's a long-term running process that does
automated edits whenever preconditions are satisfied.  Those are scarier
than someone grabbing a state extract, running an automated edit,
reviewing the results, maybe sharing them for review by others, and
choosing to push upload.

For imports, I've thought about several, and the common theme is
ENOSPARTIME, but the list is

  parcel data, but not imported because a) I'm not sure what I think is
  right, and b) I'm not sure what community consensus is.

  merging updates to massgis highway data, but this is hard

  importing NHD or masgis hydro

  importing more massgis rails/trails/etc.

  importing the towns w/o highway data, but there's a lot of manual
  merging (e.g. gloucester).  This leads to thoughts of writing code to
  auto-merge, which leads to it not happening due to not enough time.

> 4) In the US, what (if any) role should OSM US play in imports?

Perhaps helping with the above, and being elder statesmen about advice.

So all in all, my level of restraint, but a higher level of spare time,
is probably where we want people to be.  One thought is that someone
wanting to import should probably have done some manual mapping first,
to get their head around the norms and community.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 194 bytes
Desc: not available
URL: <http://lists.openstreetmap.org/pipermail/talk-us/attachments/20100806/258cdf45/attachment.pgp>

More information about the Talk-us mailing list