[OSM-dev] Fwd: Re: OSM and MongoDB

Greg Studer greg at 10gen.com
Wed Apr 13 21:41:38 BST 2011


Agree, think the issue in this case definitely wasn't related to
multiple machines.  In general, though, you often can do much better
performance-wise on large data sets by running queries on data subsets
across multiple systems, whatever software you use.  Most NoSQL dbs try
to make this particularly easy.

On Wed, 2011-04-13 at 14:44 -0500, Ian Dees wrote: 
> On Wed, Apr 13, 2011 at 2:35 PM, Andreas Scheucher
> <andreas.scheucher at gmail.com> wrote:
>         hi, 
>         
>         
>         some weeks ago, i got interested in NoSQL datababase products.
>         I had no experience with them up to now, but as it was a
>         requirement for an job, I started to read about apache
>         cassandra and thougth, this would be interesting for
>         openstreetmaps. 
>         
>         
> 
> 
> Yep, Cassandra would be an interesting option to try. In fact many
> moons ago I spoke with the folks at SimpleGeo about attempting to host
> some OSM data there in their infrastructure. At the time they didn't
> support anything but point features (and had no other way of dealing
> with metadata) so I haven't pursued it.
> 
> 
> Additionally, this talk they gave was quite informative and gave quite
> a bit of information about how they store their location data in
> Cassandra: http://www.youtube.com/watch?v=7J61pPG9j90
>  
>         
>         up to now my findings are only theoreticaly, but I would like
>         to digg deeper, when I find time. 
>         
>         
>         But one think I wonder about is, you tested it on one machine.
>         Isn't it like that, you need several nodes and loads of data
>         to really benefit from NoSQL databases? At least this was my
>         understanding of the whole thing... 
> 
> 
> The purpose of multiple machines in this case is to have relatively
> reliable storage and multiple copies of the data on different
> machines, not necessarily an increase in read speed (Greg, maybe you
> could correct me?). Last time I looked at MongoDB seriously for OSM I
> imported an entire planet, so it was "loads of data" :). I have not
> tried a whole planet with the more recent versions, though.
>  
>         
>         
>         greets, 
>         Andreas 
>         
>         
>         2011/4/13 Ian Dees <ian.dees at gmail.com> 
>                 
>                 
>                 On Tue, Apr 12, 2011 at 3:56 PM, Steve Coast
>                 <steve at asklater.com> wrote: 
>                         Interesting.
>                         
>                         How efficient is the (big)int indexing and/or
>                         masking?
>                         
>                 
>                 
>                 I haven't had a chance to look at the integer
>                 indexing/masking. If I remember it from discussions on
>                 dev a long while ago I think it's very close to
>                 geohashes. 
>                   
>                         
>                         Was this all on a single machine? 
>                 
>                 
>                 Yes. 
>                 
>                   
>                         
>                         
>                         
>                         
>                         
>                         On 4/12/2011 1:52 PM, Ian Dees wrote: 
>                         > Yep.
>                         > 
>                         > On Tue, Apr 12, 2011 at 3:51 PM, Steve Coast
>                         > <steve at asklater.com> wrote: 
>                         >         and using the builtin spatial
>                         >         index? 
>                         >         
>                         >         
>                         >         
>                         >         On 4/12/2011 1:50 PM, Ian Dees
>                         >         wrote: 
>                         >         > Yes, one document per
>                         >         > node/way/relation.
>                         >         > 
>                         >         > On Tue, Apr 12, 2011 at 3:47 PM,
>                         >         > Steve Coast <steve at asklater.com>
>                         >         > wrote: 
>                         >         >         how was the data put in
>                         >         >         the db though? 1 document
>                         >         >         per node? 
>                         >         >         
>                         >         >         
>                         >         >         On 4/12/2011 1:39 PM,
>                         >         >         Nolan Darilek wrote: 
>                         >         >         > Oopse, meant for this to
>                         >         >         > go to the whole list.
>                         >         >         > 
>                         >         >         > 
>                         >         >         > 
>                         >         >         > -------- Original
>                         >         >         > Message -------- 
>                         >         >         >    Subject: 
>                         >         >         > Re:
>                         >         >         > [OSM-dev]
>                         >         >         > OSM and
>                         >         >         > MongoDB
>                         >         >         >       Date: 
>                         >         >         > Tue, 12 Apr
>                         >         >         > 2011
>                         >         >         > 15:26:41
>                         >         >         > -0500
>                         >         >         >       From: 
>                         >         >         > Nolan
>                         >         >         > Darilek
>                         >         >         > <nolan at thewordnerd.info>
>                         >         >         >         To: 
>                         >         >         > Ian Dees
>                         >         >         > <ian.dees at gmail.com>
>                         >         >         > 
>                         >         >         > 
>                         >         >         > I had/am having a
>                         >         >         > somewhat bad experience
>                         >         >         > storing OSM data in
>                         >         >         > MongoDB.
>                         >         >         > 
>                         >         >         > Initially I stored all
>                         >         >         > map data in MongoDB, but
>                         >         >         > queries took ages. The
>                         >         >         > same queries that happen
>                         >         >         > in 100-200 MS now often
>                         >         >         > took nearly a second.
>                         >         >         > Additionally, some took
>                         >         >         > upwards of 5, and I even
>                         >         >         > found spots on my map
>                         >         >         > sparsely populated with
>                         >         >         > points, but which
>                         >         >         > reliably performed the
>                         >         >         > queries I need in 30+
>                         >         >         > seconds.
>                         >         >         > 
>                         >         >         > I filed a thorough bug
>                         >         >         > in their tracker,
>                         >         >         > including a dataset and
>                         >         >         > queries that reliably
>                         >         >         > duplicated the issue. It
>                         >         >         > was marked wontfix, I
>                         >         >         > abandoned MongoDB, and
>                         >         >         > it was apparently
>                         >         >         > re-opened and fixed
>                         >         >         > several months later. So
>                         >         >         > perhaps it's a non-issue
>                         >         >         > now.
>                         >         >         > 
>                         >         >         > I'm still using MongoDB
>                         >         >         > for part of my current
>                         >         >         > project, user POI
>                         >         >         > storage. It does indeed
>                         >         >         > use geohashes, and I'm
>                         >         >         > experiencing strange
>                         >         >         > accuracy issues. My
>                         >         >         > platform is pedestrian
>                         >         >         > navigation with many
>                         >         >         > small distance queries.
>                         >         >         > Points in the
>                         >         >         > non-MongoDB dataset are
>                         >         >         > reliably detected in a
>                         >         >         > radius roughly 100
>                         >         >         > meters around the
>                         >         >         > traveler. Points in
>                         >         >         > MongoDB queried with the
>                         >         >         > same bounding boxes
>                         >         >         > don't appear until
>                         >         >         > they're within 30-40
>                         >         >         > meters. I recently
>                         >         >         > updated from an older
>                         >         >         > version to a new build
>                         >         >         > of 1.8. The older
>                         >         >         > version widely varied
>                         >         >         > the detection range.
>                         >         >         > Some points were
>                         >         >         > detected 100 or so
>                         >         >         > meters out, while others
>                         >         >         > weren't picked up until
>                         >         >         > 30 or so. It was always
>                         >         >         > the same points, too.
>                         >         >         > The point for my
>                         >         >         > apartment remains
>                         >         >         > reliably visible for
>                         >         >         > ~100 meters or so, while
>                         >         >         > the corner store and
>                         >         >         > restaurant didn't appear
>                         >         >         > until I was very close.
>                         >         >         > 1.8 at least appears to
>                         >         >         > be consistent, always
>                         >         >         > detecting at 30 meters
>                         >         >         > or so. I can only assume
>                         >         >         > that this is a geohash
>                         >         >         > oddity that only appears
>                         >         >         > for very small
>                         >         >         > differences, something
>                         >         >         > that works out to
>                         >         >         > rounding error for
>                         >         >         > larger values.
>                         >         >         > 
>                         >         >         > I like MongoDB for many
>                         >         >         > things, but not for
>                         >         >         > geospatial data more
>                         >         >         > complicated than a
>                         >         >         > series of points. I'm
>                         >         >         > working on migrating
>                         >         >         > user/POI storage to a
>                         >         >         > geospatial store.
>                         >         >         > 
>                         >         >         > 
>                         >         >         > On 04/12/2011 01:20 PM,
>                         >         >         > Ian Dees wrote: 
>                         >         >         > > Yep, and I think Mongo
>                         >         >         > > uses geohashes as
>                         >         >         > > their index behind the
>                         >         >         > > scenes. One of the
>                         >         >         > > problems with that,
>                         >         >         > > though, is they have
>                         >         >         > > some arbitrary length
>                         >         >         > > that they compute the
>                         >         >         > > geohash to and when
>                         >         >         > > you have lots of
>                         >         >         > > points (as OSM data
>                         >         >         > > does) the buckets
>                         >         >         > > they're searching are
>                         >         >         > > very full.
>                         >         >         > > 
>                         >         >         > > On Tue, Apr 12, 2011
>                         >         >         > > at 1:00 PM, Steve
>                         >         >         > > Coast
>                         >         >         > > <steve at asklater.com>
>                         >         >         > > wrote: 
>                         >         >         > >         bbox queries
>                         >         >         > >         using the
>                         >         >         > >         built in
>                         >         >         > >         spatial
>                         >         >         > >         indexing
>                         >         >         > >         presumably?
>                         >         >         > >         OSM has it's
>                         >         >         > >         own magical
>                         >         >         > >         bitmask for
>                         >         >         > >         that, that may
>                         >         >         > >         also be as
>                         >         >         > >         fast in mongo,
>                         >         >         > >         who knows. 
>                         >         >         > >         
>                         >         >         > >         
>                         >         >         > >         On 4/11/2011
>                         >         >         > >         5:58 PM, Ian
>                         >         >         > >         Dees wrote: 
>                         >         >         > >         > On Mon, Apr
>                         >         >         > >         > 11, 2011 at
>                         >         >         > >         > 6:36 PM,
>                         >         >         > >         > Sergey
>                         >         >         > >         > Galuzo
>                         >         >         > >         > <sergal at microsoft.com> wrote: 
>                         >         >         > >         >         Hi,
>                         >         >         > >         >         
>                         >         >         > >         >          
>                         >         >         > >         >         
>                         >         >         > >         >         I am
>                         >         >         > >         >         working on evaluation of MongoDB for several storage solutions at hand. Some of them resemble current OSM editing database. I have heard that OSM dev is/was evaluating MongoDB also. I was wondering whether it possible to share the findings?
>                         >         >         > >         >         
>                         >         >         > >         >          
>                         >         >         > >         >         
>                         >         >         > >         >         
>                         >         >         > >         > 
>                         >         >         > >         > 
>                         >         >         > >         > In my
>                         >         >         > >         > experimentation with MongoDB (seen here: https://github.com/iandees/mongosm/) I found it to be very slow. Inserts were speedy, but bounding-box queries took a long time. 
>                         >         >         > >         > 
>                         >         >         > >         > 
>                         >         >         > >         > The most
>                         >         >         > >         > recent dev
>                         >         >         > >         > version of
>                         >         >         > >         > MongoDB
>                         >         >         > >         > includes
>                         >         >         > >         > "multi-location documents" support: 
>                         >         >         > >         > http://www.mongodb.org/display/DOCS/Geospatial+Indexing#GeospatialIndexing-MultilocationDocuments 
>                         >         >         > >         > 
>                         >         >         > >         > 
>                         >         >         > >         > This would
>                         >         >         > >         > allow a
>                         >         >         > >         > single way
>                         >         >         > >         > document to
>                         >         >         > >         > be indexed
>                         >         >         > >         > at multiple
>                         >         >         > >         > locations
>                         >         >         > >         > and vastly
>                         >         >         > >         > speed up the
>                         >         >         > >         > map query. 
>                         >         >         > >         > 
>                         >         >         > >         > _______________________________________________
>                         >         >         > >         > dev mailing list
>                         >         >         > >         > dev at openstreetmap.org http://lists.openstreetmap.org/listinfo/dev 
>                         >         >         > >         
>                         >         >         > >         _______________________________________________
>                         >         >         > >         dev mailing
>                         >         >         > >         list
>                         >         >         > >         dev at openstreetmap.org
>                         >         >         > >         http://lists.openstreetmap.org/listinfo/dev
>                         >         >         > >         
>                         >         >         > > 
>                         >         >         > > 
>                         >         >         > > _______________________________________________
>                         >         >         > > dev mailing list
>                         >         >         > > dev at openstreetmap.org
>                         >         >         > > http://lists.openstreetmap.org/listinfo/dev
>                         >         >         > 
>                         >         >         > 
>                         >         >         > _______________________________________________
>                         >         >         > dev mailing list
>                         >         >         > dev at openstreetmap.org
>                         >         >         > http://lists.openstreetmap.org/listinfo/dev
>                         >         >         
>                         >         >         _______________________________________________
>                         >         >         dev mailing list
>                         >         >         dev at openstreetmap.org
>                         >         >         http://lists.openstreetmap.org/listinfo/dev
>                         >         >         
>                         >         > 
>                         > 
>                 
>                 
>                 _______________________________________________
>                 dev mailing list
>                 dev at openstreetmap.org
>                 http://lists.openstreetmap.org/listinfo/dev
>                 
>         
>         
> 
> _______________________________________________
> dev mailing list
> dev at openstreetmap.org
> http://lists.openstreetmap.org/listinfo/dev





More information about the dev mailing list