[OSM-dev] OSM Database Infrastructure

Tue Jan 15 00:53:08 GMT 2008

In message <20BCA5B2-D79A-40DA-915C-A98DC9CAF900 at asklater.com>
          SteveC <steve at asklater.com> wrote:

> I think the single-server still works very well. It does so by
> ignoring doctrine on postgis and gis extensions and a bunch of scary C
> extensions thought up by Nick Hill and implemented by TomH. jburgess
> is doing similarly scary things with apache modules for shipping out
> tiles.

The C extensions were only really used during the update - for day
to day use that work is done in rails rather than the database. There
is a C version of the code for ruby but the ruby code works fine as
well.

> > The reason I'm e-mailing you is to ask about the server and database
> > infrastructure. After showing the project to a couple of my computer
> > science professors, they were surprised that there is one database
> > server that handles such a large load. Has there been any research
> > into splitting the load up between several region-specific database
> > servers? I would think that since editors usually don't need to
> > cross continental boundaries, the data for continents could be split
> > up between several servers, distributing the load among the machines
> > and making the end user experience faster. I would propose that each
> > regional (or "tier 2") server would be the only address allowed to
> > make changes on the global (or "tier 1") server.

The basic principle that we work on is "if it ain't broke, don't
fix it" and in this case the current setup is working fine at the
moment so we don't see a huge need to make things more complicated
at the moment.

I'm not aware of any significant performance issues with the API at
the moment anyway, but maybe you've seen something I haven't?

A split along the lines you suggest would be very hard anyway
because of the problem of objects which span more than one
server. Even if you split entirely at sea (which is hard and
gives you irregular borders to split on) you still have things
like ferry routes as somebody has already pointed out.

The more common suggestion is to have read only slave servers
but we don't really have a need for them at the moment.

Clustering the data geographically (but all in one database) is
something I have considered for when we need to extract some more
performance from the database, but even then it is hard to come
up with a good split - most simple solutions leave you with some
partitions with very little data.

Tom

-- 
Tom Hughes (tom at compton.nu)
http://www.compton.nu/