[OSM-dev] New database server

Tue May 21 09:49:08 UTC 2013

On 21 May 2013 02:32, Jason Remillard <remillard.jason at gmail.com> wrote:

> The server that we are planning on purchasing is monster. Very
> complicated and expensive. I am concerned that this might not be the
> best way to go.

Indeed, it might not be the best way to go, and any thoughts and
brainpower applied to the problem are always very welcome.

Of course, at OWG we *do* think it's the best approach, given all the
trade-offs involved, and it's a decision that hasn't been undertaken
lightly.

> We have a google summer of code proposal to write an edge proxy server
> for the OSM API. I don't know if the project will be accepted, but it

It's worth bearing in mind that we don't have any places on this
year's GSoC, and even if OSGeo decided to go for this project on our
behalf, it could be months before we even have any idea whether or not
the implementation is feasible. I don't mean to be negative, but
weighing up the "here and now" requirements against a hypothetical
alternative at some point in the future is one of these trade-offs
that we routinely have to make at OWG.

> For the money we planning on spending on the big
> server, we could get could get several of these smaller edge servers
> with flash disks and a less expensive redundant write/history database
> server.

Well, we could certainly spend the money on small edge servers, but
it's not clear to me why you think that would make the central server
less expensive. I think this proposal may be worthwhile but it's
somewhat orthogonal to the goals of the new server.

At the moment we have two osm-database-class machines, the older of
which (smaug) is no longer capable of handling the full load on its
own, but is still useful as a database-level read slave. The newer
machine (ramoth) can handle the load entirely on its own, but is
approaching the limits of dealing with the full read+write load.

When it comes to the master database, we need certain characteristics:
A) To be able to handle the write load (and the associated reads
involved in checking constraints etc)
B) To be able to store the entire database
C) To be more than one of these machines, for failover

Smaug most likely doesn't fulfil A, and so currently we don't really
fulfil C. So we need a new machine that can do A+B, and these are
unfortunately expensive. In order to last more than 6 months, the new
machine also needs plenty of space (B) on fast disks (A) which is
where most of the money goes.

Having map-call-proxies, as you discuss, doesn't solve any of A, B or
C for the master database. Sharing out the read-only load is a good
idea, but it's not clear to me whether it is better done with
postgres-level database replication (as we have been doing),
proxy-level replication (as per this GSoC idea), or even just
examining logs and ban-hammering people scraping the map call (my
personal favourite!).

> As we need to scale

It's best in these conversations to be precise about what we mean by
"to scale". Scaling read-requests is only one aspect, and we have a
variety of feasible (and active) options. Long-term, we may[1] need to
work around the need for all these machines to store the entire
database (B), and that's Hard. We may[2] also need to figure out how
to solve A, and that's Hard too.

Like I said at the start, thoughts and brainpower are always welcome!

Cheers,
Andy

[1] If we grow osm faster than Moore's law, otherwise: happy days
[2] If db-write activity outpaces disk-io and/or network bandwidth
increases, otherwise: happy days