[OSM-dev] OSM with Hadoop

Wed Jan 14 10:16:58 UTC 2015

Playing devil's advocate for a minute, honestly I'm not sure I would be
as content with the current OSM data storage and processing
architecture. Main central server with a single multi-TB-sized database
somehow screams single point of failure to me... Add to that the growth
rate (which is pretty crazy when you look at it closely) and I'd be
worried whether such setup is future-proof enough to justify brushing
off Hadoop topics with "you just don't know how to use Postgres/PostGIS
properly".

I love Postgres as much as the next guy, hell, I'm actively trying to
get my code working with FULL HISTORY database and Postgres does have a
lot of features that make this easier/possible. But at some point you
need to look at the big picture and where will the infrastructure be in
1, 2 and 5 years time?

Yeah I know it's just talk and no solutions but for now I don't have any
to this particular problem :P

Paweł

On Mon, Jan 12, 2015, at 20:50, Frederik Ramm wrote:
> Stephen,
> 
>    previous discussions of combining NoSQL *or* massively parallel
> storage with OSM were often less driven by the approach "let's
> investigate solid future storage models for OSM" but rather by "hey
> there's a cool new technology I'd like to play with and I'm sure it can
> somehow work with OSM".
> 
> The results were often, if there were any at all, along the lines of
> "hey this particular very specific use case is now 20000% faster than
> before", but looking closer you'd see that the same speedup could have
> been had with an old-fashioned un-sexy "create index" statement if the
> author had known anything about PostgreSQL/PostGIS (*), or maybe that
> the data import took five weeks unless you had massive hardware, or
> somesuch.
> 
> I was therefore a bit skeptical reading your message, but relieved when
> I found that you're keeping an open mind about the results and plan to
> thoroughly analyse whether using a massively parallel storage will
> indeed perform better than plain old PostgreSQL/PostGIS for what are
> OSM's everyday use cases.
> 
> (I'd like to see the word "cost-effective" thrown in somewhere - and for
> data reading we have a sufficiently well scaling data replication in
> place already. As far as the central database is concerned, OSM is very
> interested in making it easy for everyone to run their own local copy.)
> 
> Bye
> Frederik
> 
> (*) It is an often overlooked fact that the amount of actual geo
> information in the central database is small - just the node coordinates
> - everything else is plain old relational stuff. Therefore the OSM
> database doesn't even need or use the PostGIS spatial extensions - but
> they are often used for analysing OSM data after importing them in a
> separate database.
> 
> -- 
> Frederik Ramm  ##  eMail frederik at remote.org  ##  N49°00'09" E008°23'33"
> 
> _______________________________________________
> dev mailing list
> dev at openstreetmap.org
> https://lists.openstreetmap.org/listinfo/dev