[OSM-dev] New database server

Tue May 21 21:25:38 UTC 2013

Hi Andy,

Thank you for the detailed reply.

I have no issues with how the money is being handled, nor spending
this kind of money on hardware.

OSM growth is pretty amazing right now. Over the lifetime of this
server we should be planning for at least 15x/20x more traffic. If
this server can handle 15x/20x more traffic than the current site
generates, then we are good, case closed. If we can't get to 15x/20x
traffic levels without large architectural changes, we should start
steering ourselves into a new architecture now. Standard proxy servers
will not work well with our API, it will be painful if we are not
proactive about it. It seems like some kind of geographically
distributed cached edge server will eventually be needed. The OWG
obviously knows all of this.

Really the bottom line of my email is as follows:

The Google Summer Of Code project (if it moves forward) will be not
that hard to integrate into the current server infrastructure and it
has a reasonable chance of being stood up by this fall. I just wanted
to insure that the OWG was aware of the project and considered it in
the resource planning for our new servers. Concretely, this means we
have the option of configuring the new database server for only
history and write requests rather than all of the API.

Thanks
Jason.

On Tue, May 21, 2013 at 5:49 AM, Andy Allan <gravitystorm at gmail.com> wrote:
> On 21 May 2013 02:32, Jason Remillard <remillard.jason at gmail.com> wrote:
>
>> The server that we are planning on purchasing is monster. Very
>> complicated and expensive. I am concerned that this might not be the
>> best way to go.
>
> Indeed, it might not be the best way to go, and any thoughts and
> brainpower applied to the problem are always very welcome.
>
> Of course, at OWG we *do* think it's the best approach, given all the
> trade-offs involved, and it's a decision that hasn't been undertaken
> lightly.
>
>> We have a google summer of code proposal to write an edge proxy server
>> for the OSM API. I don't know if the project will be accepted, but it
>
> It's worth bearing in mind that we don't have any places on this
> year's GSoC, and even if OSGeo decided to go for this project on our
> behalf, it could be months before we even have any idea whether or not
> the implementation is feasible. I don't mean to be negative, but
> weighing up the "here and now" requirements against a hypothetical
> alternative at some point in the future is one of these trade-offs
> that we routinely have to make at OWG.
>
>> For the money we planning on spending on the big
>> server, we could get could get several of these smaller edge servers
>> with flash disks and a less expensive redundant write/history database
>> server.
>
> Well, we could certainly spend the money on small edge servers, but
> it's not clear to me why you think that would make the central server
> less expensive. I think this proposal may be worthwhile but it's
> somewhat orthogonal to the goals of the new server.
>
> At the moment we have two osm-database-class machines, the older of
> which (smaug) is no longer capable of handling the full load on its
> own, but is still useful as a database-level read slave. The newer
> machine (ramoth) can handle the load entirely on its own, but is
> approaching the limits of dealing with the full read+write load.
>
> When it comes to the master database, we need certain characteristics:
> A) To be able to handle the write load (and the associated reads
> involved in checking constraints etc)
> B) To be able to store the entire database
> C) To be more than one of these machines, for failover
>
> Smaug most likely doesn't fulfil A, and so currently we don't really
> fulfil C. So we need a new machine that can do A+B, and these are
> unfortunately expensive. In order to last more than 6 months, the new
> machine also needs plenty of space (B) on fast disks (A) which is
> where most of the money goes.
>
> Having map-call-proxies, as you discuss, doesn't solve any of A, B or
> C for the master database. Sharing out the read-only load is a good
> idea, but it's not clear to me whether it is better done with
> postgres-level database replication (as we have been doing),
> proxy-level replication (as per this GSoC idea), or even just
> examining logs and ban-hammering people scraping the map call (my
> personal favourite!).
>
>> As we need to scale
>
> It's best in these conversations to be precise about what we mean by
> "to scale". Scaling read-requests is only one aspect, and we have a
> variety of feasible (and active) options. Long-term, we may[1] need to
> work around the need for all these machines to store the entire
> database (B), and that's Hard. We may[2] also need to figure out how
> to solve A, and that's Hard too.
>
> Like I said at the start, thoughts and brainpower are always welcome!
>
> Cheers,
> Andy
>
> [1] If we grow osm faster than Moore's law, otherwise: happy days
> [2] If db-write activity outpaces disk-io and/or network bandwidth
> increases, otherwise: happy days