[OSM-dev] Fwd: Re: OSM and MongoDB

Ian Dees ian.dees at gmail.com
Wed Apr 13 20:44:13 BST 2011


On Wed, Apr 13, 2011 at 2:35 PM, Andreas Scheucher <
andreas.scheucher at gmail.com> wrote:

> hi,
>
> some weeks ago, i got interested in NoSQL datababase products. I had no
> experience with them up to now, but as it was a requirement for an job, I
> started to read about apache cassandra and thougth, this would be
> interesting for openstreetmaps.
>
>
Yep, Cassandra would be an interesting option to try. In fact many moons ago
I spoke with the folks at SimpleGeo about attempting to host some OSM data
there in their infrastructure. At the time they didn't support anything but
point features (and had no other way of dealing with metadata) so I haven't
pursued it.

Additionally, this talk they gave was quite informative and gave quite a bit
of information about how they store their location data in Cassandra:
http://www.youtube.com/watch?v=7J61pPG9j90


> up to now my findings are only theoreticaly, but I would like to digg
> deeper, when I find time.
>
> But one think I wonder about is, you tested it on one machine. Isn't it
> like that, you need several nodes and loads of data to really benefit from
> NoSQL databases? At least this was my understanding of the whole thing...
>

The purpose of multiple machines in this case is to have relatively reliable
storage and multiple copies of the data on different machines, not
necessarily an increase in read speed (Greg, maybe you could correct me?).
Last time I looked at MongoDB seriously for OSM I imported an entire planet,
so it was "loads of data" :). I have not tried a whole planet with the more
recent versions, though.


>
> greets,
> Andreas
>
> 2011/4/13 Ian Dees <ian.dees at gmail.com>
>
>>
>> On Tue, Apr 12, 2011 at 3:56 PM, Steve Coast <steve at asklater.com> wrote:
>>
>>>  Interesting.
>>>
>>> How efficient is the (big)int indexing and/or masking?
>>>
>>
>> I haven't had a chance to look at the integer indexing/masking. If I
>> remember it from discussions on dev a long while ago I think it's very close
>> to geohashes.
>>
>>
>>>
>>> Was this all on a single machine?
>>>
>>
>> Yes.
>>
>>
>>>
>>>
>>>
>>>
>>> On 4/12/2011 1:52 PM, Ian Dees wrote:
>>>
>>> Yep.
>>>
>>> On Tue, Apr 12, 2011 at 3:51 PM, Steve Coast <steve at asklater.com> wrote:
>>>
>>>>  and using the builtin spatial index?
>>>>
>>>>
>>>>
>>>> On 4/12/2011 1:50 PM, Ian Dees wrote:
>>>>
>>>> Yes, one document per node/way/relation.
>>>>
>>>> On Tue, Apr 12, 2011 at 3:47 PM, Steve Coast <steve at asklater.com>wrote:
>>>>
>>>>>  how was the data put in the db though? 1 document per node?
>>>>>
>>>>>
>>>>> On 4/12/2011 1:39 PM, Nolan Darilek wrote:
>>>>>
>>>>> Oopse, meant for this to go to the whole list.
>>>>>
>>>>>
>>>>>
>>>>> -------- Original Message --------  Subject: Re: [OSM-dev] OSM and
>>>>> MongoDB  Date: Tue, 12 Apr 2011 15:26:41 -0500  From: Nolan Darilek
>>>>> <nolan at thewordnerd.info> <nolan at thewordnerd.info>  To: Ian Dees
>>>>> <ian.dees at gmail.com> <ian.dees at gmail.com>
>>>>>
>>>>> I had/am having a somewhat bad experience storing OSM data in MongoDB.
>>>>>
>>>>> Initially I stored all map data in MongoDB, but queries took ages. The
>>>>> same queries that happen in 100-200 MS now often took nearly a second.
>>>>> Additionally, some took upwards of 5, and I even found spots on my map
>>>>> sparsely populated with points, but which reliably performed the queries I
>>>>> need in 30+ seconds.
>>>>>
>>>>> I filed a thorough bug in their tracker, including a dataset and
>>>>> queries that reliably duplicated the issue. It was marked wontfix, I
>>>>> abandoned MongoDB, and it was apparently re-opened and fixed several months
>>>>> later. So perhaps it's a non-issue now.
>>>>>
>>>>> I'm still using MongoDB for part of my current project, user POI
>>>>> storage. It does indeed use geohashes, and I'm experiencing strange accuracy
>>>>> issues. My platform is pedestrian navigation with many small distance
>>>>> queries. Points in the non-MongoDB dataset are reliably detected in a radius
>>>>> roughly 100 meters around the traveler. Points in MongoDB queried with the
>>>>> same bounding boxes don't appear until they're within 30-40 meters. I
>>>>> recently updated from an older version to a new build of 1.8. The older
>>>>> version widely varied the detection range. Some points were detected 100 or
>>>>> so meters out, while others weren't picked up until 30 or so. It was always
>>>>> the same points, too. The point for my apartment remains reliably visible
>>>>> for ~100 meters or so, while the corner store and restaurant didn't appear
>>>>> until I was very close. 1.8 at least appears to be consistent, always
>>>>> detecting at 30 meters or so. I can only assume that this is a geohash
>>>>> oddity that only appears for very small differences, something that works
>>>>> out to rounding error for larger values.
>>>>>
>>>>> I like MongoDB for many things, but not for geospatial data more
>>>>> complicated than a series of points. I'm working on migrating user/POI
>>>>> storage to a geospatial store.
>>>>>
>>>>>
>>>>> On 04/12/2011 01:20 PM, Ian Dees wrote:
>>>>>
>>>>> Yep, and I think Mongo uses geohashes as their index behind the scenes.
>>>>> One of the problems with that, though, is they have some arbitrary length
>>>>> that they compute the geohash to and when you have lots of points (as OSM
>>>>> data does) the buckets they're searching are very full.
>>>>>
>>>>> On Tue, Apr 12, 2011 at 1:00 PM, Steve Coast <steve at asklater.com>wrote:
>>>>>
>>>>>>  bbox queries using the built in spatial indexing presumably? OSM has
>>>>>> it's own magical bitmask for that, that may also be as fast in mongo, who
>>>>>> knows.
>>>>>>
>>>>>>
>>>>>> On 4/11/2011 5:58 PM, Ian Dees wrote:
>>>>>>
>>>>>>  On Mon, Apr 11, 2011 at 6:36 PM, Sergey Galuzo <sergal at microsoft.com
>>>>>> > wrote:
>>>>>>
>>>>>>>  Hi,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I am working on evaluation of MongoDB for several storage solutions
>>>>>>> at hand. Some of them resemble current OSM editing database. I have heard
>>>>>>> that OSM dev is/was evaluating MongoDB also. I was wondering whether it
>>>>>>> possible to share the findings?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>  In my experimentation with MongoDB (seen here:
>>>>>> https://github.com/iandees/mongosm/) I found it to be very slow.
>>>>>> Inserts were speedy, but bounding-box queries took a long time.
>>>>>>
>>>>>>  The most recent dev version of MongoDB includes "multi-location
>>>>>> documents" support:
>>>>>>
>>>>>> http://www.mongodb.org/display/DOCS/Geospatial+Indexing#GeospatialIndexing-MultilocationDocuments
>>>>>>
>>>>>>  This would allow a single way document to be indexed at multiple
>>>>>> locations and vastly speed up the map query.
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> dev mailing list
>>>>>> dev at openstreetmap.orghttp://lists.openstreetmap.org/listinfo/dev
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> dev mailing list
>>>>>> dev at openstreetmap.org
>>>>>> http://lists.openstreetmap.org/listinfo/dev
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> dev mailing listdev at openstreetmap.orghttp://lists.openstreetmap.org/listinfo/dev
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> dev mailing listdev at openstreetmap.orghttp://lists.openstreetmap.org/listinfo/dev
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> dev mailing list
>>>>> dev at openstreetmap.org
>>>>> http://lists.openstreetmap.org/listinfo/dev
>>>>>
>>>>>
>>>>
>>>
>>
>> _______________________________________________
>> dev mailing list
>> dev at openstreetmap.org
>> http://lists.openstreetmap.org/listinfo/dev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20110413/c806e221/attachment-0001.html>


More information about the dev mailing list