[OSM-dev] OSM with Hadoop

Mon Jan 12 19:50:55 UTC 2015

Stephen,

   previous discussions of combining NoSQL *or* massively parallel
storage with OSM were often less driven by the approach "let's
investigate solid future storage models for OSM" but rather by "hey
there's a cool new technology I'd like to play with and I'm sure it can
somehow work with OSM".

The results were often, if there were any at all, along the lines of
"hey this particular very specific use case is now 20000% faster than
before", but looking closer you'd see that the same speedup could have
been had with an old-fashioned un-sexy "create index" statement if the
author had known anything about PostgreSQL/PostGIS (*), or maybe that
the data import took five weeks unless you had massive hardware, or
somesuch.

I was therefore a bit skeptical reading your message, but relieved when
I found that you're keeping an open mind about the results and plan to
thoroughly analyse whether using a massively parallel storage will
indeed perform better than plain old PostgreSQL/PostGIS for what are
OSM's everyday use cases.

(I'd like to see the word "cost-effective" thrown in somewhere - and for
data reading we have a sufficiently well scaling data replication in
place already. As far as the central database is concerned, OSM is very
interested in making it easy for everyone to run their own local copy.)

Bye
Frederik

(*) It is an often overlooked fact that the amount of actual geo
information in the central database is small - just the node coordinates
- everything else is plain old relational stuff. Therefore the OSM
database doesn't even need or use the PostGIS spatial extensions - but
they are often used for analysing OSM data after importing them in a
separate database.

-- 
Frederik Ramm  ##  eMail frederik at remote.org  ##  N49°00'09" E008°23'33"