[OSM-dev] Fwd: Re: OSM and MongoDB
Greg Studer
greg at 10gen.com
Wed Apr 13 21:41:38 BST 2011
Agree, think the issue in this case definitely wasn't related to
multiple machines. In general, though, you often can do much better
performance-wise on large data sets by running queries on data subsets
across multiple systems, whatever software you use. Most NoSQL dbs try
to make this particularly easy.
On Wed, 2011-04-13 at 14:44 -0500, Ian Dees wrote:
> On Wed, Apr 13, 2011 at 2:35 PM, Andreas Scheucher
> <andreas.scheucher at gmail.com> wrote:
> hi,
>
>
> some weeks ago, i got interested in NoSQL datababase products.
> I had no experience with them up to now, but as it was a
> requirement for an job, I started to read about apache
> cassandra and thougth, this would be interesting for
> openstreetmaps.
>
>
>
>
> Yep, Cassandra would be an interesting option to try. In fact many
> moons ago I spoke with the folks at SimpleGeo about attempting to host
> some OSM data there in their infrastructure. At the time they didn't
> support anything but point features (and had no other way of dealing
> with metadata) so I haven't pursued it.
>
>
> Additionally, this talk they gave was quite informative and gave quite
> a bit of information about how they store their location data in
> Cassandra: http://www.youtube.com/watch?v=7J61pPG9j90
>
>
> up to now my findings are only theoreticaly, but I would like
> to digg deeper, when I find time.
>
>
> But one think I wonder about is, you tested it on one machine.
> Isn't it like that, you need several nodes and loads of data
> to really benefit from NoSQL databases? At least this was my
> understanding of the whole thing...
>
>
> The purpose of multiple machines in this case is to have relatively
> reliable storage and multiple copies of the data on different
> machines, not necessarily an increase in read speed (Greg, maybe you
> could correct me?). Last time I looked at MongoDB seriously for OSM I
> imported an entire planet, so it was "loads of data" :). I have not
> tried a whole planet with the more recent versions, though.
>
>
>
> greets,
> Andreas
>
>
> 2011/4/13 Ian Dees <ian.dees at gmail.com>
>
>
> On Tue, Apr 12, 2011 at 3:56 PM, Steve Coast
> <steve at asklater.com> wrote:
> Interesting.
>
> How efficient is the (big)int indexing and/or
> masking?
>
>
>
> I haven't had a chance to look at the integer
> indexing/masking. If I remember it from discussions on
> dev a long while ago I think it's very close to
> geohashes.
>
>
> Was this all on a single machine?
>
>
> Yes.
>
>
>
>
>
>
>
> On 4/12/2011 1:52 PM, Ian Dees wrote:
> > Yep.
> >
> > On Tue, Apr 12, 2011 at 3:51 PM, Steve Coast
> > <steve at asklater.com> wrote:
> > and using the builtin spatial
> > index?
> >
> >
> >
> > On 4/12/2011 1:50 PM, Ian Dees
> > wrote:
> > > Yes, one document per
> > > node/way/relation.
> > >
> > > On Tue, Apr 12, 2011 at 3:47 PM,
> > > Steve Coast <steve at asklater.com>
> > > wrote:
> > > how was the data put in
> > > the db though? 1 document
> > > per node?
> > >
> > >
> > > On 4/12/2011 1:39 PM,
> > > Nolan Darilek wrote:
> > > > Oopse, meant for this to
> > > > go to the whole list.
> > > >
> > > >
> > > >
> > > > -------- Original
> > > > Message --------
> > > > Subject:
> > > > Re:
> > > > [OSM-dev]
> > > > OSM and
> > > > MongoDB
> > > > Date:
> > > > Tue, 12 Apr
> > > > 2011
> > > > 15:26:41
> > > > -0500
> > > > From:
> > > > Nolan
> > > > Darilek
> > > > <nolan at thewordnerd.info>
> > > > To:
> > > > Ian Dees
> > > > <ian.dees at gmail.com>
> > > >
> > > >
> > > > I had/am having a
> > > > somewhat bad experience
> > > > storing OSM data in
> > > > MongoDB.
> > > >
> > > > Initially I stored all
> > > > map data in MongoDB, but
> > > > queries took ages. The
> > > > same queries that happen
> > > > in 100-200 MS now often
> > > > took nearly a second.
> > > > Additionally, some took
> > > > upwards of 5, and I even
> > > > found spots on my map
> > > > sparsely populated with
> > > > points, but which
> > > > reliably performed the
> > > > queries I need in 30+
> > > > seconds.
> > > >
> > > > I filed a thorough bug
> > > > in their tracker,
> > > > including a dataset and
> > > > queries that reliably
> > > > duplicated the issue. It
> > > > was marked wontfix, I
> > > > abandoned MongoDB, and
> > > > it was apparently
> > > > re-opened and fixed
> > > > several months later. So
> > > > perhaps it's a non-issue
> > > > now.
> > > >
> > > > I'm still using MongoDB
> > > > for part of my current
> > > > project, user POI
> > > > storage. It does indeed
> > > > use geohashes, and I'm
> > > > experiencing strange
> > > > accuracy issues. My
> > > > platform is pedestrian
> > > > navigation with many
> > > > small distance queries.
> > > > Points in the
> > > > non-MongoDB dataset are
> > > > reliably detected in a
> > > > radius roughly 100
> > > > meters around the
> > > > traveler. Points in
> > > > MongoDB queried with the
> > > > same bounding boxes
> > > > don't appear until
> > > > they're within 30-40
> > > > meters. I recently
> > > > updated from an older
> > > > version to a new build
> > > > of 1.8. The older
> > > > version widely varied
> > > > the detection range.
> > > > Some points were
> > > > detected 100 or so
> > > > meters out, while others
> > > > weren't picked up until
> > > > 30 or so. It was always
> > > > the same points, too.
> > > > The point for my
> > > > apartment remains
> > > > reliably visible for
> > > > ~100 meters or so, while
> > > > the corner store and
> > > > restaurant didn't appear
> > > > until I was very close.
> > > > 1.8 at least appears to
> > > > be consistent, always
> > > > detecting at 30 meters
> > > > or so. I can only assume
> > > > that this is a geohash
> > > > oddity that only appears
> > > > for very small
> > > > differences, something
> > > > that works out to
> > > > rounding error for
> > > > larger values.
> > > >
> > > > I like MongoDB for many
> > > > things, but not for
> > > > geospatial data more
> > > > complicated than a
> > > > series of points. I'm
> > > > working on migrating
> > > > user/POI storage to a
> > > > geospatial store.
> > > >
> > > >
> > > > On 04/12/2011 01:20 PM,
> > > > Ian Dees wrote:
> > > > > Yep, and I think Mongo
> > > > > uses geohashes as
> > > > > their index behind the
> > > > > scenes. One of the
> > > > > problems with that,
> > > > > though, is they have
> > > > > some arbitrary length
> > > > > that they compute the
> > > > > geohash to and when
> > > > > you have lots of
> > > > > points (as OSM data
> > > > > does) the buckets
> > > > > they're searching are
> > > > > very full.
> > > > >
> > > > > On Tue, Apr 12, 2011
> > > > > at 1:00 PM, Steve
> > > > > Coast
> > > > > <steve at asklater.com>
> > > > > wrote:
> > > > > bbox queries
> > > > > using the
> > > > > built in
> > > > > spatial
> > > > > indexing
> > > > > presumably?
> > > > > OSM has it's
> > > > > own magical
> > > > > bitmask for
> > > > > that, that may
> > > > > also be as
> > > > > fast in mongo,
> > > > > who knows.
> > > > >
> > > > >
> > > > > On 4/11/2011
> > > > > 5:58 PM, Ian
> > > > > Dees wrote:
> > > > > > On Mon, Apr
> > > > > > 11, 2011 at
> > > > > > 6:36 PM,
> > > > > > Sergey
> > > > > > Galuzo
> > > > > > <sergal at microsoft.com> wrote:
> > > > > > Hi,
> > > > > >
> > > > > >
> > > > > >
> > > > > > I am
> > > > > > working on evaluation of MongoDB for several storage solutions at hand. Some of them resemble current OSM editing database. I have heard that OSM dev is/was evaluating MongoDB also. I was wondering whether it possible to share the findings?
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > In my
> > > > > > experimentation with MongoDB (seen here: https://github.com/iandees/mongosm/) I found it to be very slow. Inserts were speedy, but bounding-box queries took a long time.
> > > > > >
> > > > > >
> > > > > > The most
> > > > > > recent dev
> > > > > > version of
> > > > > > MongoDB
> > > > > > includes
> > > > > > "multi-location documents" support:
> > > > > > http://www.mongodb.org/display/DOCS/Geospatial+Indexing#GeospatialIndexing-MultilocationDocuments
> > > > > >
> > > > > >
> > > > > > This would
> > > > > > allow a
> > > > > > single way
> > > > > > document to
> > > > > > be indexed
> > > > > > at multiple
> > > > > > locations
> > > > > > and vastly
> > > > > > speed up the
> > > > > > map query.
> > > > > >
> > > > > > _______________________________________________
> > > > > > dev mailing list
> > > > > > dev at openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
> > > > >
> > > > > _______________________________________________
> > > > > dev mailing
> > > > > list
> > > > > dev at openstreetmap.org
> > > > > http://lists.openstreetmap.org/listinfo/dev
> > > > >
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > dev mailing list
> > > > > dev at openstreetmap.org
> > > > > http://lists.openstreetmap.org/listinfo/dev
> > > >
> > > >
> > > > _______________________________________________
> > > > dev mailing list
> > > > dev at openstreetmap.org
> > > > http://lists.openstreetmap.org/listinfo/dev
> > >
> > > _______________________________________________
> > > dev mailing list
> > > dev at openstreetmap.org
> > > http://lists.openstreetmap.org/listinfo/dev
> > >
> > >
> >
>
>
> _______________________________________________
> dev mailing list
> dev at openstreetmap.org
> http://lists.openstreetmap.org/listinfo/dev
>
>
>
>
> _______________________________________________
> dev mailing list
> dev at openstreetmap.org
> http://lists.openstreetmap.org/listinfo/dev
More information about the dev
mailing list