[Talk-us] Address improvement through imports?

Toby Murray toby.murray at gmail.com
Thu Nov 3 04:13:53 GMT 2011


On Wed, Nov 2, 2011 at 5:12 PM, Frederik Ramm <frederik at remote.org> wrote:
> Hi,
>
> On Tue, 1 Nov 2011 17:14:03 -0600
> Martijn van Exel <m at rtijn.org> wrote:
>> But let's discuss: are
>> address imports useful (I say yes, for geocoding and routing they're
>> indispensable), necessary (I say yes, potential OSM data users will
>> want to be able to do these things) and feasible (I say yes, if
>> there's local mappers to oversee it)? Best,
>
> They are useful if you want OSM to be your quick fix to some itch you
> want to scratch. If you cannot be bothered to process the freely
> available hosue number datasets properly yourself but would rather
> abuse OSM as your free data processor where you dump in whatever you
> have and whoopsie, magically it becomes useful in MapQuest's Nominatim,
> then yeah, sure, go ahead, import until the shit comes out of
> everyone's ears - why learn from past mistakes. You probably think
> that OSM in the US is so broken, it cannot get worse no matter how
> much additional data sources you dump onto OSM. You know as well as I do
> that your "local mappers to oversee it" is a fig leaf!
>
> Importing more and more data will not make OSM strong. It might make
> OSM look useful in the short term but that's cheap usefulness - the
> same usefulness could be produced by just importing all your free
> sources into some other consolidated data set, something that is not
> unique to OSM, something that anyone can do at any time in their
> basement without the help of a crowd-sourced project. And for this
> cheap usefulness you are ruining the chances of there ever being a
> strong community - instead you'll have a few people acting as funnels
> for data dumped in from whatever sources. This is not the way to
> achieve a community that owns the map. And you know that and *still*
> you're happy to do it. OSM will never get anywhere in the States if
> people think like this. And that from someone who only just moved over!

I know the "population density" argument isn't always recognized as
being valid but I just ran some numbers that some might find
interesting.

Germany has a pretty active OSM community, right? Let's suppose that
we could achieve a similar level of community in western Kansas. There
are currently about 44,000 unique user IDs in the Germany extract from
Geofabrik. Over 81,880,000 Germans, that makes for a mapper population
of 0.054%.

Population of western Kansas: 438,000

Assuming the same level of OSM participation of 0.054% that gives us
230 mappers to cover 1/3 the land mass of Germany.

So, start with a blank map of Germany. Remember those 44,000 users?
Now you only get 700. Report back when you have finished mapping all
the addresses. I'll even let you skip the big cities. My point is
about area, not volume of data. Remember, these are average mappers so
80% (80? 90? I know I've seen this statistic but don't recall it off
the top of my head) of them will make one edit and never come back.

This is the BEST case scenario for western Kansas and probably most of
the interior of the country from Nevada to the Mississippi river,
except for the urban pockets here and there.

It is a 9 hour drive from Topeka to Denver and I think you go past a
total of 3 cities with a population of over 10,000. In fact, out of
the 54 counties west of Wichita, only 7 have a population for the
whole county of over 10,000. So while we might be able to start OSM
communities in some of the larger cities, vast stretches of the
country would remain *completely* empty. How long do you want us to
wait? 5 years? 10 years?

Now I guess you could say that where there are few people, there is no
need for maps. But isn't our goal to map the entire planet? If OSM
wants to be taken seriously as a global dataset, I don't think that
argument is valid.

All that to say that what has already been said by others. While I am
not a big fan of imports, I see address data as:
1) important for the general usefulness of our data
2) one of the easier things that can be imported with a low likelihood
of things going horribly wrong
3) freely available with acceptable accuracy from our local governments.

And the "accuracy" part is important. Notice how no one wants to touch
TIGER address data even though that would be the easiest solution for
a nationwide data set if we were just rabidly importing things. So I
would say we HAVE learned from past mistakes. (Not saying that the
TIGER import was a *complete* mistake, mind you... but it obviously
does have a lot of problems)

Toby



More information about the Talk-us mailing list