[OSM-dev] scaling

SteveC steve at asklater.com
Sun Jan 9 21:38:25 GMT 2011


The amusing recent FakeSteveC ... I guess I will call it a LOLSCALE got me thinking about what people actually think of the boards comment on scaling;

http://fakestevec.blogspot.com/2011/01/know-your-osm-memes-2.html

As much as I want a dialogue with my fake self, a discourse on the thrust of the argument is I think merited.

I think scaling is the number one issue OSM should tackle technically.

The days of just 'buy a bigger database server' are I think over. It's not very elegant and it's just too damn expensive. Perhaps we could do another iteration, but if OSM bandwidth continues to outpace moore's law and donations then it just doesn't work.

So that means scaling horizontally to more than one machine. And if you're doing that, you may as well do more than 2 machines, or more than 20, or whatever figure you have in your head.

I think this is number one because I think the amount of data OSM is going to have to deal with is going to explode in a fairly short time scale. I don't mean just another big import. Sadly I can't be public but I had a conversation with a large company over a year ago (no, it's not MS or CM) who  speculated about putting OSM on the front page of their maps product, which would approximately turn all of our yearly statistics to daily or weekly numbers. We went through a decision tree about how that could happen. Every leaf node on that tree came back as basically we couldn't do it.

Could we accept the edit traffic? No, far too much. Could we provide a good user experience, clearly no. Could they help us scale? No they would be viewed as taking over on any kind of timescale they needed. Could they host us? Again no, it would be too slow of a process and it'd be a takeover and the community would probably reject it.

I could continue, but the basic direction you can imagine. Imagine you had millions of daily users and you wanted to use OSM in a respectable community-driven community way. And let's say you get over the 4chan rhetoric over on talk at . If you think through it, within any reasonable time frame (like 6-12 months) it's very hard to make that happen, and so you may as well go build your own things. Which I think sucks and is a loss for OSM.

Now this conversation has come up a few more times recently with other large mapping companies. And I feel like I'm rehashing those conversations above. I'd love to be public about it, but those companies aren't ready to talk yet.

Even if people weren't privately proposing notching up our traffic a few orders of magnitude, it would still make a lot of sense to figure out how to scale.

Back to FakeSteveC and the negative eye-rolling comment on thinking about this for a few seconds. Well it turns out we have. The board specifically didn't list any technical measure on purpose, that's not it's job. But the direction of supporting and encouraging basic things like scaling is I think well within the bounds.

I haven't a clue what we should use to scale horizontally. There are a few major architectural choices and then within those there are lots of implementations. Some are too new and buggy, some are in the wrong language ... it's clearly a bit of a mess out there right now. There are also a bunch of religious beliefs around how you do this stuff too.

So, how do we get from here to there? Speaking strictly personally, I think one of the best uses of funds in or out of OSM has been bug bounties. Personally, I think putting up some bounties on demoing either architectures or implementations is a good idea, because we all know it comes down to working code. Something like "$1,000 to the first person who demonstrates OSMs DB running on more than one machine" then another $1000 for proving it can handle a certain throughput and so on is one way to get there. That's the way personally I'd like to encourage it to happen, but that's neither been agreed by the board or something MS is immediately going to do. It's just an idea and one that I like.

There is clearly a lot of work to do just fleshing out options and trying things.

There is an alternative, which is to just give up on scaling. That works, but it means OSM fractures in to multiple datasets and I envisage OSM becoming the debian of maps and someone else (there are several candidates) becoming the Canonical or Ubuntu. I don't much like that scenario, but it's there as a possibility.

So, what do you think? And if you agree it's worth doing, how do we achieve it either as individuals or the board or companies supporting it?

PS if it looks weird that I respond to certain emails and not others then that's because messages to, from or cc some of the trolls are automatically deleted and I don't see them. So even if you just cc them, I won't see your email. I highly recommend doing this.

Steve

stevecoast.com


More information about the dev mailing list