[OSM-dev] scaling

SteveC steve at asklater.com
Mon Jan 10 01:45:06 GMT 2011


On Jan 9, 2011, at 3:31 PM, Nic Roets wrote:

> Hello Steve,
> 
> I have ideas for a super fast API mirror with the capability of
> answering historic queries i.e. contents of bbox l,t,r,b at time h.
> But it can't be the main server because it ignores changesets and
> other info.
> 
> If we moved all the load that is not directly related to editing to
> such a server, would it make a massive difference ? Get rid of all the
> scraping and spidering ?

What you're effectively saying is move certain things to something that looks like xapi?

> 
> http://www.google.co.za/search?q=edna+site%3Awww.openstreetmap.org&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a
> 
> On Mon, Jan 10, 2011 at 12:27 AM, Igor Brejc <igor.brejc at gmail.com> wrote:
>> I'll play a heretic here, but my feeling is that "openness" in OSM will more
>> and more come under question, and the reason is scaling. Yes, OSM can
>> proclaim the access to its data is open, but in reality only someone (or
>> better some organization/company) with enough HW resources to be able to
>> process planetary OSM data can actually make use of it.
>> 
>> In reality most potential users of OSM data don't really need global data,
>> they want easy access to OSM data that's local to them. And that's what OSM
>> infrastructure does not provide. XAPI and CloudMade and Geofabrik extracts
>> are just poor workarounds (that's not to say that they aren't valuable).
>> 
>> A simple example: if I need OSM data for European highways, why do I need to
>> process the whole planet file? Europe is physically separated from America
>> and I see very few reasons for having to share OSM data across continents in
>> a single planetary file.
>> 
>> Separating data in vertical layers could help too: country borders certainly
>> belong to a completely different level than, say, park benches. And they
>> change a bit less often. Why keep them in the same store?
>> 
>> Igor
>> 
>> On 9.1.2011 22:38, SteveC wrote:
>>> 
>>> The amusing recent FakeSteveC ... I guess I will call it a LOLSCALE got me
>>> thinking about what people actually think of the boards comment on scaling;
>>> 
>>> http://fakestevec.blogspot.com/2011/01/know-your-osm-memes-2.html
>>> 
>>> As much as I want a dialogue with my fake self, a discourse on the thrust
>>> of the argument is I think merited.
>>> 
>>> I think scaling is the number one issue OSM should tackle technically.
>>> 
>>> The days of just 'buy a bigger database server' are I think over. It's not
>>> very elegant and it's just too damn expensive. Perhaps we could do another
>>> iteration, but if OSM bandwidth continues to outpace moore's law and
>>> donations then it just doesn't work.
>>> 
>>> So that means scaling horizontally to more than one machine. And if you're
>>> doing that, you may as well do more than 2 machines, or more than 20, or
>>> whatever figure you have in your head.
>>> 
>>> I think this is number one because I think the amount of data OSM is going
>>> to have to deal with is going to explode in a fairly short time scale. I
>>> don't mean just another big import. Sadly I can't be public but I had a
>>> conversation with a large company over a year ago (no, it's not MS or CM)
>>> who  speculated about putting OSM on the front page of their maps product,
>>> which would approximately turn all of our yearly statistics to daily or
>>> weekly numbers. We went through a decision tree about how that could happen.
>>> Every leaf node on that tree came back as basically we couldn't do it.
>>> 
>>> Could we accept the edit traffic? No, far too much. Could we provide a
>>> good user experience, clearly no. Could they help us scale? No they would be
>>> viewed as taking over on any kind of timescale they needed. Could they host
>>> us? Again no, it would be too slow of a process and it'd be a takeover and
>>> the community would probably reject it.
>>> 
>>> I could continue, but the basic direction you can imagine. Imagine you had
>>> millions of daily users and you wanted to use OSM in a respectable
>>> community-driven community way. And let's say you get over the 4chan
>>> rhetoric over on talk at . If you think through it, within any reasonable time
>>> frame (like 6-12 months) it's very hard to make that happen, and so you may
>>> as well go build your own things. Which I think sucks and is a loss for OSM.
>>> 
>>> Now this conversation has come up a few more times recently with other
>>> large mapping companies. And I feel like I'm rehashing those conversations
>>> above. I'd love to be public about it, but those companies aren't ready to
>>> talk yet.
>>> 
>>> Even if people weren't privately proposing notching up our traffic a few
>>> orders of magnitude, it would still make a lot of sense to figure out how to
>>> scale.
>>> 
>>> Back to FakeSteveC and the negative eye-rolling comment on thinking about
>>> this for a few seconds. Well it turns out we have. The board specifically
>>> didn't list any technical measure on purpose, that's not it's job. But the
>>> direction of supporting and encouraging basic things like scaling is I think
>>> well within the bounds.
>>> 
>>> I haven't a clue what we should use to scale horizontally. There are a few
>>> major architectural choices and then within those there are lots of
>>> implementations. Some are too new and buggy, some are in the wrong language
>>> ... it's clearly a bit of a mess out there right now. There are also a bunch
>>> of religious beliefs around how you do this stuff too.
>>> 
>>> So, how do we get from here to there? Speaking strictly personally, I
>>> think one of the best uses of funds in or out of OSM has been bug bounties.
>>> Personally, I think putting up some bounties on demoing either architectures
>>> or implementations is a good idea, because we all know it comes down to
>>> working code. Something like "$1,000 to the first person who demonstrates
>>> OSMs DB running on more than one machine" then another $1000 for proving it
>>> can handle a certain throughput and so on is one way to get there. That's
>>> the way personally I'd like to encourage it to happen, but that's neither
>>> been agreed by the board or something MS is immediately going to do. It's
>>> just an idea and one that I like.
>>> 
>>> There is clearly a lot of work to do just fleshing out options and trying
>>> things.
>>> 
>>> There is an alternative, which is to just give up on scaling. That works,
>>> but it means OSM fractures in to multiple datasets and I envisage OSM
>>> becoming the debian of maps and someone else (there are several candidates)
>>> becoming the Canonical or Ubuntu. I don't much like that scenario, but it's
>>> there as a possibility.
>>> 
>>> So, what do you think? And if you agree it's worth doing, how do we
>>> achieve it either as individuals or the board or companies supporting it?
>>> 
>>> PS if it looks weird that I respond to certain emails and not others then
>>> that's because messages to, from or cc some of the trolls are automatically
>>> deleted and I don't see them. So even if you just cc them, I won't see your
>>> email. I highly recommend doing this.
>>> 
>>> Steve
>>> 
>>> stevecoast.com
>>> _______________________________________________
>>> dev mailing list
>>> dev at openstreetmap.org
>>> http://lists.openstreetmap.org/listinfo/dev
>> 
>> 
>> --
>> http://igorbrejc.net
>> 
>> 
>> _______________________________________________
>> dev mailing list
>> dev at openstreetmap.org
>> http://lists.openstreetmap.org/listinfo/dev
>> 
> 

Steve

stevecoast.com




More information about the dev mailing list