[OSM-dev] OSM and CouchDB/GeoCouch

Serge Wroclawski emacsen at gmail.com
Sat Jul 3 21:40:37 BST 2010


On Sat, Jul 3, 2010 at 8:17 PM, Nolan Darilek <nolan at thewordnerd.info> wrote:

>> Is this work available anywhere? How did you find performance to be, and
>> to what uses did you put it?

There's Ian and my github accounts, and you can download it there, but:

1) RIght now the only hardware we've tested is somewhat dated.

2) We don't really have a universal benchmark.

3) We don't have a great dataset to test.

But since you asked, here's the situation as it is now:

Ian wrote some code to do the initial slurping of the data in from a
planet.osm (or extract) dump, as well as a simple frontend using
django.

We discussed his schema and I found a few bugs, so I've begun
rewriting the thing. I've done this a few times to try to optimize,
but with the hardware we have, we've found it takes days to process
planet.osm.

I'm rewriting the code again to make it testable (the original code
got the job done but wasn't very modular) and also writing code to
allow you to update the database with updates (dailies, hourlies,
etc.).

As for the choice to use Mongo... I'm not married to Mongo, but:

1) In my limited experience with Couch, inserts were very slow (up to
2 seconds per insert!). That's not going to work well.

2) I really like the idea of being able to shard the database.

3) I like Mongo's ability to do map/reduce. This means we can do
complex queries and hopefully not pay too much of a performance
penalty.

4) MongoDB does have a Geo library (even though it's in early stages).


I know Ian has talked with the MongoDB developers about some potential
optimizations.


I think if there are several projects working on document-based
approaches to storing OSM data, or at least other approaches in
general, we should get together and discuss some standard test suite
of data and maybe of calls that we should support. I know Ian and I
were targetting XAPI. I don't know what other folks are working on.
And I think we could have a standard dataset to use for our testing.

- Serge




More information about the dev mailing list