[OSM-dev] OSM StreetDensityMap

Tue Mar 6 12:31:31 GMT 2012

 > Cool. I've been wanting to give hadoop / map/reduce a try with OSM data
 > but the wiki does not offer much. It would be nice if someone with some
 > experience would create a wiki page. I'm sure it would be interesting
 > for the community as well as GIScience folks to have a place to start.

If I've some time I'll create a wiki page for "osm on hadoop" (and post 
it here).

 >     It gives also a sort-of "osm activity map".
 >
 > Well, it does and it doesn't. You'd have to compare it to a reference
 > road network density map to appreciate the activity of the OSM community
 > in representing reality in OSM.

That's right.

 > I see a lot of potential for this beyond 'simple' visualisation. Systems
 > like TagInfo and OWL could benefit, maybe? Does your framework lend
 > itself for (near) real time processing of OSM data, or does it only work
 > with snapshot data?

MapReduce itself is a programming model. It allows you to process data 
by defining map- and reduce-functions (and is thus quite easy to learn).

It's implemented as a distributed batch processing framework and allows 
you to process TBs of data on a cluster of up to hundreds of nodes. The 
real benefit of using such a system is that it scales linearly (well, 
you could say between O(n) and O(nlogn)) and single systems (like 
relational DBs) can't scale that high.

Our cluster was around 10 nodes, and it took us about 3-4 hours to 
create the map and store it on HBase (although the cluster was not busy 
the whole time) [where the uncompressed planet-file is about 200GB].

That said, you can run small jobs on hadoop/mapreduce in a few minutes 
(=> near realtime) and it would be be possible to
   * process the planet-file once and store the results in a DB
   * process the planet-file diffs (e.g. hourly) and update the DB

TagInfo-like systems (aggregating big data and creating statistics) 
could definitely be built using hadoop/mapreduce.

- npl