[OSM-dev] Distributed Data Store
Stefan de Konink
stefan at konink.de
Thu Jan 22 10:54:52 GMT 2009
Hi Scott,
Scott Shawcroft wrote:
> Stefan de Konink wrote:
>> - Admins don't want to maintain multiple systems
>> - The fear of anything new not developed by the devs (especially if it
>> is not build in Ruby)
> Who are the admins for the systems?
Tom Hughes is a factor to take in account. All your base...
> We're open to particular solutions
> and if there is a bias towards Ruby we'd look closer at it. However, it
> may be that there is a better solution. Who are the designated devs?
That is basically a 'free for all'. Read the history of SVN and/or this
list to find out which people are working on OSM. Personally I am
working on a C implementation of the API. Other people tend to work on
the official RubyOnRails one.
> Also, Amazon WebServices could be used to have virtual machines instead
> of real ones which need maintenance.
If Amazon wants to sponsor OSM, that is a great thing ;)
>> Technical problems might be more interesting:
>>
>> - Synchronization issues, even for a proxy solution; single or
>> multiple write databases should distribute their data. Out of sync
>> scenarios etc.
>> - Especially geo related issues, how to distribute a real geoquery.
> Totally, synchronization is important. Simple partitioning wouldn't
> have this problem but if multiple copies will be shared then we could
> get into trouble.
>
> I think the geo element is what makes this more interesting than the
> standard data storage issue.
The main point is that OSM by design in not a GIS database, we can make
it one, but the current features approach the dataset in a 'traditional'
way, this is not bad perse, though some problems would tend to love GIS
solutions.
>>> We're interested in trying our hand at creating a better system for
>>> storing OSM data. We're interested in what kind of computing
>>> resources to design for (how many machines) and whether we can get
>>> access logs in order to test our implementation against.
>>
>> Related to accesslogs I found a long brick wall, it might be a better
>> thing to use a requester that just makes random requests. Sources are
>> available for that.
> Well, randomness is probably not the best model. I imagine that the
> server's traffic patterns are also geo related. For example, people are
> more likely to work on areas they are near and areas on the earth in
> daylight or evening are more likely to have those people accessing the
> site. Or perhaps a mapping party has a number of people working on the
> same area all at once. A simple geo partitioning would drive all of
> this traffic to one particular server. This simple access does work
> better when retrieving data because it will utilize all the different
> machines.
Like Erik pointed out, diffs will give you writes. I think reads are
more interesting.
>>> Also, we'd love to have OSM community members involved since we're
>>> new to the organization.
>>>
>>> Lastly, I think we plan to donate our code to the community with the
>>> hope that it is useful.
>>>
>>> What do you think?
>>
>> I love to brainstorm with you :) The next month I want to spend on my
>> MSc thesis about improving native geospatial support in MonetDB. And
>> the OSM data in it. It would ofcourse be great if the ideas comming
>> out of such session can make it to State of the Map 2009.
>>
>> It would be good to point you at DBslayer (the standard implementation
>> or the Cherokee one), it will balance requests but with a better
>> balancer could do geobalancing too :)
> I'll have to take a look at it. Existing solutions are good but we are
> really looking at laying down some code too I think.
Creating for example a specific SQL based scheduler that can handle
partitions was a thing I was thinking about in the night:
http://code.google.com/p/cherokee/issues/detail?id=328
Stefan
More information about the dev
mailing list