[OSM-dev] scaling

Mon Jan 10 00:31:39 GMT 2011

SteveC-2 wrote:
> 
> So, what do you think? And if you agree it's worth doing, how do we
> achieve it either as individuals or the board or companies supporting it?
> 

Ignoring the social aspects for now. 

Depending on how far you really want to scale, I think a lot of the
necessary components are already in place. If none of the "out of the box"
solutions such as the new Postgresql 9.0 replication mechanism  work, we
would possibly get a fair distance by splitting out reads and writes onto
separate db servers.

If the main db servers would only handle data uploads and all other requests
are served from read only mirrors, it may gain us a fair amount (although
that would need to be verified somehow). The read only mirrors can then
easily scale horizontally.  With the minutely diffs, we already have a
mechanism that can replicate relevant data onto read-only mirrors. Just add
more read-only mirrors feeding off those replication-diffs. The
minutely-diffs can probably be provided more often than minutely to get the
replication delay down.

Compared to a db level replication, it wouldn't guaranty that all read-only
mirrors are fully up-to-date and that one can't download "old" data. But as
long as the replication delay can be kept below a minute or two, I don't
think that would matter.

A typical JOSM session is probably a lot longer than a minute or two between
download and upload. Furthermore, a slightly larger map-call can
occasionally during busy times take several minutes alone.  So the editing
clients already need to be able to deal with having stale data and having to
retry again in case of conflict. The optimistic locking through version
numbers should mostly safeguard against problems there.

The tiles-at-home project has basically already done that and provides a
pretty powerful infrastructure to support the hundreds of t at h clients
hitting the API simultaneous with large requests. It has a load balancer
that distributes requests across multiple read only mirrors, automatically
handling machines failing. Servers automatically drop out of the balancer in
case they fall behind on the diff imports to ensure not to serve stale data.

As far as I can tell, the t at h setup has not been without issues, but much of
those could probably be solved if it were under central control and not
distributed across continents and amongst various people.

Of cause, implementing and testing it to make sure it all is rock solid
would still be a quite an amount of work, but at least probably wouldn't at
need too much new development. (Apart from perhaps rewriting the API in
something other than ruby)

But I am sure, the TWG will correct me on the above... ;-)

Kai

-- 
View this message in context: http://gis.638310.n2.nabble.com/scaling-tp5905305p5905609.html
Sent from the Developer Discussion mailing list archive at Nabble.com.