[OSM-dev] MongoDB possibly unsuitable for OSM data

Fri Jul 16 20:50:39 BST 2010

On Fri, Jul 16, 2010 at 2:34 PM, Nolan Darilek <nolan at thewordnerd.info>wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hey, folks. I suppose that I might be wrong about this, and if so then
> I'd love to be, but I thought that I'd share my recent findings here
> either to inspire work in different directions if possible, or to close
> a possibly useless avenue of work if not.
>
> I have a real-time navigation app that stores its data in MongoDB, using
> its geospatial queries. During the course of my work, I routinely found
> that some queries were reliably slow while others were reliably fast.
> We're talking differences of seconds, some taking 5, some 30, while
> others were completed in under a second. Naturally, this is unsuitable
> for an app that needs to provide near real-time feedback. I opened an
> issue here, including my dump of an import of Texas' OSM nodes:
>
> http://jira.mongodb.org/browse/SERVER-1392
>
> It seems that I'm running up against limitations in MongoDB's
> geohash-based mechanism. It's probably perfectly suitable for most
> average geospatial-based searches, but not so much for the case of
> rendering OSM data in reliably short bursts of time. The issue has been
> marked wontfix.
>
> I'm open to the possibility that I'm missing something, but have long
> suspected that the geospatial support wasn't up to something of this
> magnitude. I suppose that it might work as a data storage and
> replication system, but if you need to get back data quickly then
> MongoDB likely isn't a good fit.
>
> Anyhow, I thought that I'd share, especially as some of us were
> discussing use of MongoDB here a few weeks back.
>
>
I was seeing quite the opposite results with smallish (city-sized) bounding
boxes: I was getting very fast responses (much faster than a second or two).
I was definitely running into limitations of the Python
serializer/deserializer before I was running into limitations of Mongo. This
was after inserting most of an entire planet dump.

However, it does seem like his explanation is valid: geohashing creates
buckets and when those buckets are too big they fill up and make for slow
queries. The nice thing about geohashing is that you can have
arbitrarily-sized buckets, so I always assumed that they were picking the
size based on how many points they saw. Maybe not.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20100716/6761a886/attachment.html>