[OSM-dev] Distributed Data Store

Thu Jan 22 07:58:32 GMT 2009

Stefan,
My thoughts are below.

Stefan de Konink wrote:
> Hey,
>
> Scott Shawcroft wrote:
>> My friend Jason (cced) and I are seniors at the University of 
>> Washington in Computer Science and Engineering.  On your FAQ you say 
>> people interested in distributing the database across multiple 
>> computers should email the list.  Well, here we are.  We are 
>> currently in a distributed systems capstone course during which we 
>> need to spend the quarter (until mid March) on a single substantial 
>> project.
>
> Sounds fun :) There are a lot of 'ideas' here around, geographical 
> balancing etc. The standard divide and conquer methods in databases, 
> etc. The main problems in OSM:
>
> - Admins don't want to maintain multiple systems
> - The fear of anything new not developed by the devs (especially if it 
> is not build in Ruby)
Who are the admins for the systems?  We're open to particular solutions 
and if there is a bias towards Ruby we'd look closer at it.  However, it 
may be that there is a better solution.  Who are the designated devs?

Also, Amazon WebServices could be used to have virtual machines instead 
of real ones which need maintenance.
>
>
> Technical problems might be more interesting:
>
> - Synchronization issues, even for a proxy solution; single or 
> multiple write databases should distribute their data. Out of sync 
> scenarios etc.
> - Especially geo related issues, how to distribute a real geoquery.
Totally, synchronization is important.  Simple partitioning wouldn't 
have this problem but if multiple copies will be shared then we could 
get into trouble.

I think the geo element is what makes this more interesting than the 
standard data storage issue.
>
>> We're interested in trying our hand at creating a better system for 
>> storing OSM data.  We're interested in what kind of computing 
>> resources to design for (how many machines) and whether we can get 
>> access logs in order to test our implementation against.
>
> Related to accesslogs I found a long brick wall, it might be a better 
> thing to use a requester that just makes random requests. Sources are 
> available for that.
Well, randomness is probably not the best model.  I imagine that the 
server's traffic patterns are also geo related.  For example, people are 
more likely to work on areas they are near and areas on the earth in 
daylight or evening are more likely to have those people accessing the 
site.   Or perhaps a mapping party has a number of people working on the 
same area all at once.  A simple geo partitioning would drive all of 
this traffic to one particular server.  This simple access does work 
better when retrieving data because it will utilize all the different 
machines.
>
>> Also, we'd love to have OSM community members involved since we're 
>> new to the organization.
>>
>> Lastly, I think we plan to donate our code to the community with the 
>> hope that it is useful.
>>
>> What do you think?
>
> I love to brainstorm with you :) The next month I want to spend on my 
> MSc thesis about improving native geospatial support in MonetDB. And 
> the OSM data in it. It would ofcourse be great if the ideas comming 
> out of such session can make it to State of the Map 2009.
>
> It would be good to point you at DBslayer (the standard implementation 
> or the Cherokee one), it will balance requests but with a better 
> balancer could do geobalancing too :)
I'll have to take a look at it.  Existing solutions are good but we are 
really looking at laying down some code too I think.

~Scott
>
>
> Yours Sincerely,
>
> Stefan de Konink
>