<div class="gmail_quote">On Wed, Apr 13, 2011 at 2:35 PM, Andreas Scheucher <span dir="ltr"><<a href="mailto:andreas.scheucher@gmail.com">andreas.scheucher@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
hi,<div><br></div><div>some weeks ago, i got interested in NoSQL datababase products. I had no experience with them up to now, but as it was a requirement for an job, I started to read about apache cassandra and thougth, this would be interesting for openstreetmaps.</div>
<div><br></div></blockquote><div><br></div><div>Yep, Cassandra would be an interesting option to try. In fact many moons ago I spoke with the folks at SimpleGeo about attempting to host some OSM data there in their infrastructure. At the time they didn't support anything but point features (and had no other way of dealing with metadata) so I haven't pursued it.</div>
<div><br></div><div>Additionally, this talk they gave was quite informative and gave quite a bit of information about how they store their location data in Cassandra: <a href="http://www.youtube.com/watch?v=7J61pPG9j90">http://www.youtube.com/watch?v=7J61pPG9j90</a></div>
<meta http-equiv="content-type" content="text/html; charset=utf-8"><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div></div><div>up to now my findings are only theoreticaly, but I would like to digg deeper, when I find time.</div>
<div><br></div><div>But one think I wonder about is, you tested it on one machine. Isn't it like that, you need several nodes and loads of data to really benefit from NoSQL databases? At least this was my understanding of the whole thing...</div>
</blockquote><div><br></div><div>The purpose of multiple machines in this case is to have relatively reliable storage and multiple copies of the data on different machines, not necessarily an increase in read speed (Greg, maybe you could correct me?). Last time I looked at MongoDB seriously for OSM I imported an entire planet, so it was "loads of data" :). I have not tried a whole planet with the more recent versions, though.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div><br></div><div>greets,</div><div>Andreas</div><div><div></div><div class="h5"><br><div class="gmail_quote">2011/4/13 Ian Dees <span dir="ltr"><<a href="mailto:ian.dees@gmail.com" target="_blank">ian.dees@gmail.com</a>></span><br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="gmail_quote"><br></div><div class="gmail_quote"><div>On Tue, Apr 12, 2011 at 3:56 PM, Steve Coast <span dir="ltr"><<a href="mailto:steve@asklater.com" target="_blank">steve@asklater.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#ffffff">
Interesting.<br>
<br>
How efficient is the (big)int indexing and/or masking?<br></div></blockquote><div><br></div></div><div>I haven't had a chance to look at the integer indexing/masking. If I remember it from discussions on dev a long while ago I think it's very close to geohashes.</div>
<div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#ffffff">
<br>
Was this all on a single machine?</div></blockquote><div><br></div></div><div>Yes.</div><div><div></div><div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#ffffff">
<div><div></div><div><br>
<br>
<br>
<br>
On 4/12/2011 1:52 PM, Ian Dees wrote:
<blockquote type="cite">Yep.<br>
<br>
<div class="gmail_quote">On Tue, Apr 12, 2011 at 3:51 PM, Steve
Coast <span dir="ltr"><<a href="mailto:steve@asklater.com" target="_blank">steve@asklater.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204, 204, 204);padding-left:1ex">
<div text="#000000" bgcolor="#ffffff"> and using the builtin
spatial index?
<div>
<div><br>
<br>
<br>
On 4/12/2011 1:50 PM, Ian Dees wrote:
<blockquote type="cite">Yes, one document per
node/way/relation.<br>
<br>
<div class="gmail_quote">On Tue, Apr 12, 2011 at 3:47
PM, Steve Coast <span dir="ltr"><<a href="mailto:steve@asklater.com" target="_blank">steve@asklater.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204, 204, 204);padding-left:1ex">
<div text="#000000" bgcolor="#ffffff"> how was the
data put in the db though? 1 document per node?
<div>
<div><br>
<br>
On 4/12/2011 1:39 PM, Nolan Darilek wrote:
<blockquote type="cite"> Oopse, meant for
this to go to the whole list.<br>
<br>
<br>
<br>
-------- Original Message --------
<table border="0" cellpadding="0" cellspacing="0">
<tbody>
<tr>
<th valign="BASELINE" align="RIGHT" nowrap>Subject: </th>
<td>Re: [OSM-dev] OSM and MongoDB</td>
</tr>
<tr>
<th valign="BASELINE" align="RIGHT" nowrap>Date: </th>
<td>Tue, 12 Apr 2011 15:26:41 -0500</td>
</tr>
<tr>
<th valign="BASELINE" align="RIGHT" nowrap>From: </th>
<td>Nolan Darilek <a href="mailto:nolan@thewordnerd.info" target="_blank"><nolan@thewordnerd.info></a></td>
</tr>
<tr>
<th valign="BASELINE" align="RIGHT" nowrap>To: </th>
<td>Ian Dees <a href="mailto:ian.dees@gmail.com" target="_blank"><ian.dees@gmail.com></a></td>
</tr>
</tbody>
</table>
<br>
<br>
I had/am having a somewhat bad experience
storing OSM data in MongoDB.<br>
<br>
Initially I stored all map data in
MongoDB, but queries took ages. The same
queries that happen in 100-200 MS now
often took nearly a second. Additionally,
some took upwards of 5, and I even found
spots on my map sparsely populated with
points, but which reliably performed the
queries I need in 30+ seconds.<br>
<br>
I filed a thorough bug in their tracker,
including a dataset and queries that
reliably duplicated the issue. It was
marked wontfix, I abandoned MongoDB, and
it was apparently re-opened and fixed
several months later. So perhaps it's a
non-issue now.<br>
<br>
I'm still using MongoDB for part of my
current project, user POI storage. It does
indeed use geohashes, and I'm experiencing
strange accuracy issues. My platform is
pedestrian navigation with many small
distance queries. Points in the
non-MongoDB dataset are reliably detected
in a radius roughly 100 meters around the
traveler. Points in MongoDB queried with
the same bounding boxes don't appear until
they're within 30-40 meters. I recently
updated from an older version to a new
build of 1.8. The older version widely
varied the detection range. Some points
were detected 100 or so meters out, while
others weren't picked up until 30 or so.
It was always the same points, too. The
point for my apartment remains reliably
visible for ~100 meters or so, while the
corner store and restaurant didn't appear
until I was very close. 1.8 at least
appears to be consistent, always detecting
at 30 meters or so. I can only assume that
this is a geohash oddity that only appears
for very small differences, something that
works out to rounding error for larger
values.<br>
<br>
I like MongoDB for many things, but not
for geospatial data more complicated than
a series of points. I'm working on
migrating user/POI storage to a geospatial
store.<br>
<br>
<br>
On 04/12/2011 01:20 PM, Ian Dees wrote:
<blockquote type="cite">Yep, and I think
Mongo uses geohashes as their index
behind the scenes. One of the problems
with that, though, is they have some
arbitrary length that they compute the
geohash to and when you have lots of
points (as OSM data does) the buckets
they're searching are very full.<br>
<br>
<div class="gmail_quote">On Tue, Apr 12,
2011 at 1:00 PM, Steve Coast <span dir="ltr"><<a href="mailto:steve@asklater.com" target="_blank">steve@asklater.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204, 204, 204);padding-left:1ex">
<div text="#000000" bgcolor="#ffffff"> bbox queries
using the built in spatial
indexing presumably? OSM has it's
own magical bitmask for that, that
may also be as fast in mongo, who
knows.
<div>
<div><br>
<br>
On 4/11/2011 5:58 PM, Ian Dees
wrote: </div>
</div>
<blockquote type="cite">
<div>
<div>
<div class="gmail_quote">On
Mon, Apr 11, 2011 at 6:36
PM, Sergey Galuzo <span dir="ltr"><<a href="mailto:sergal@microsoft.com" target="_blank">sergal@microsoft.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204, 204, 204);padding-left:1ex">
<div link="blue" vlink="purple" lang="EN-US">
<div>
<p class="MsoNormal"><span>Hi,</span></p>
<p class="MsoNormal"><span> </span></p>
<p class="MsoNormal"><span>I
am working on
evaluation of
MongoDB for
several storage
solutions at
hand. Some of
them resemble
current OSM
editing
database. I have
heard that OSM
dev is/was
evaluating
MongoDB also. I
was wondering
whether it
possible to
share the
findings?</span></p>
<p class="MsoNormal"><span> </span></p>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>In my experimentation
with MongoDB (seen
here: <a href="https://github.com/iandees/mongosm/" target="_blank">https://github.com/iandees/mongosm/</a>)
I found it to be very
slow. Inserts were
speedy, but bounding-box
queries took a long
time.</div>
<div><br>
</div>
<div>The most recent dev
version of MongoDB
includes "multi-location
documents" support:</div>
<div> <a href="http://www.mongodb.org/display/DOCS/Geospatial+Indexing#GeospatialIndexing-MultilocationDocuments" target="_blank">http://www.mongodb.org/display/DOCS/Geospatial+Indexing#GeospatialIndexing-MultilocationDocuments</a></div>
<div><br>
</div>
<div>This would allow a
single way document to
be indexed at multiple
locations and vastly
speed up the map query.</div>
</div>
</div>
</div>
<pre><fieldset></fieldset>
_______________________________________________
dev mailing list
<div><a href="mailto:dev@openstreetmap.org" target="_blank">dev@openstreetmap.org</a>
<a href="http://lists.openstreetmap.org/listinfo/dev" target="_blank">http://lists.openstreetmap.org/listinfo/dev</a>
</div></pre>
</blockquote>
</div>
<br>
_______________________________________________<br>
dev mailing list<br>
<a href="mailto:dev@openstreetmap.org" target="_blank">dev@openstreetmap.org</a><br>
<a href="http://lists.openstreetmap.org/listinfo/dev" target="_blank">http://lists.openstreetmap.org/listinfo/dev</a><br>
<br>
</blockquote>
</div>
<br>
<pre><fieldset></fieldset>
_______________________________________________
dev mailing list
<a href="mailto:dev@openstreetmap.org" target="_blank">dev@openstreetmap.org</a>
<a href="http://lists.openstreetmap.org/listinfo/dev" target="_blank">http://lists.openstreetmap.org/listinfo/dev</a>
</pre>
</blockquote>
<br>
<pre><fieldset></fieldset>
_______________________________________________
dev mailing list
<a href="mailto:dev@openstreetmap.org" target="_blank">dev@openstreetmap.org</a>
<a href="http://lists.openstreetmap.org/listinfo/dev" target="_blank">http://lists.openstreetmap.org/listinfo/dev</a>
</pre>
</blockquote>
</div>
</div>
</div>
<br>
_______________________________________________<br>
dev mailing list<br>
<a href="mailto:dev@openstreetmap.org" target="_blank">dev@openstreetmap.org</a><br>
<a href="http://lists.openstreetmap.org/listinfo/dev" target="_blank">http://lists.openstreetmap.org/listinfo/dev</a><br>
<br>
</blockquote>
</div>
<br>
</blockquote>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</blockquote>
</div></div></div>
</blockquote></div></div></div><br>
<br>_______________________________________________<br>
dev mailing list<br>
<a href="mailto:dev@openstreetmap.org" target="_blank">dev@openstreetmap.org</a><br>
<a href="http://lists.openstreetmap.org/listinfo/dev" target="_blank">http://lists.openstreetmap.org/listinfo/dev</a><br>
<br></blockquote></div><br>
</div></div></blockquote></div><br>