[OSM-dev] Data source for robot

Tue Oct 12 11:56:52 BST 2010

On Tue, Oct 12, 2010 at 4:36 AM, Peter Budny <peterb at gatech.edu> wrote:

> If route relations are not required, then what are
> http://wiki.openstreetmap.org/wiki/Relation:route#Road_Routes for?

Not required and "don't exist" aren't quite the same things.

One major issue with relations in general is very little software
knows how to handle them, and that's especially true for things like
routing software, but that's not at the core of my concerns, which
I'll elaborate on later in the mail.

> They /are/ required, because roads may be discontiguous in various ways:
> a road may change names (e.g. Main Street North becomes Main Street
> South, but to a driver or pedestrian, both are just one continuous Main
> Street), or even be physically discontiguous (some state and even US
> Highways do this).

I'm a little confused by this example.

"Main Street North becomes Main Street" - how would you handle this?
What specifically would you do? Add a relation? What tags would you
add, or remove, from the individual ways?

>  Using TIGER data, we can automate the
> process, but the bot's work will not be perfect; humans will still have
> to check it and make a few corrections.  Still, if it does 95% of the
> work for them correctly, this is pretty good IMO.  (After all, TIGER data
> itself is not even close to 95% correct.)

You've identified several issues in this paragraph, and I'd like to
flush them out:

1) Your data source, TIGER, is by your own admission, not accurate. I
don't want to get into a discussion about TIGER (that may be best left
for osm-us), but when you start with a dataset as, let's say
"controversial" as TIGER, you can expect a lot of concerns from the
community.

2) You say that humans will have to check it and make corrections.
What mechanism do you propose to integrate into your mass-edits which
would integrate human validation? In other words, how do you plan on
accomplishing the human validation step before modifying the database?

Now some concerns that the community probably has, but isn't articulating.

3) The road to hell in OSM is paved with bot intentions.

OSM has a long, negative history with bots. We have a very small
number of good imports, and dozens (if not more) bad imports. Bad
imports are so commonplace in OSM that within the OSM community, bots
of any sort are discouraged, but especially any imports, and
especially (as you appear to be proposing), merging existing data with
imported data.

This is a recipe for disaster.

4) How well do you know OSM?

Elaborating on my previous point, OSM is a very attractive project and
very smart folks come to it all the time with a great idea about how a
bot or an import could be very beneficial. Unfortunately, while these
people may understand the data, and maybe the representation, unless
you're familiar with OSM, you don't know the pitfalls that come in and
cause the most problems.

Here's a small but real example: Let's say your import is chugging
along, and then it comes across an area where someone's already done
the work. How will it react? Would it overwrite the contributor's
work? Would it stop? If it stops, would it know which segments have
been committed to the DB and which haven't (ie would it be able to
prevent duplicates?) Would your bot handle tags which users may have
added to the way, or relation? And so on...

This is why the observation about bots we have is that no one who has
been with the project < a year should do them. And most people who
suggest making bots have been with the project < 6 months.

5) Academic Research

I think that it's great that academics are interested in using OSM for
their research. But at the same time, I've worked in academic
computing for most of my professional career, surrounded by some of
the smartest people in their field, at both NIH, and NASA. These are
the best of the best.

And my view of much of what's produced by academics who write software
is that it's poo-poo.

My specific concern here is that your research is focused in a very
narrow way, which is understandable for a school project, but has
implications to the larger project which might not work out.

My suggestion to you is that you take the planet.osm, write your code
for school using your own OSM sandbox. Then publish the results, and
then work with the community later on regarding applying your research
to the live map.

- Serge