[OSM-dev] on-disk indexing of geodata
Sascha Silbe
sascha-ml-gis-osm-dev at silbe.org
Fri Oct 17 15:40:45 BST 2008
On Fri, Oct 17, 2008 at 03:25:59PM +0200, Freek wrote:
> Generally you keep a bit of free space in all nodes to accommodate a
> number of insertions and deletions, and only split or merge nodes when
> they overflow or underflow so that you don't need to move around all
> kinds of elements all the time.
But how do you do those splits, i.e. node inserts? Just moving
everything behind the insertion point by one block would be way too
expensive for large trees, so there's probably a better method.
>> I haven't done any research on that topic, though, since I don't need
>> it yet (bulk
>> imports are quite fast - ~25 minutes for the 865MB europe.osm.bz2).
> That's not bad, I suppose your index still fits in memory?
Yup. I mmap() the whole database. For europe it's less than twice the
size of my physical memory (5.3GB vs. 3.5GB), so most of it fits into
memory. For whole planet imports it gets much slower (5.6h instead of
2.5h if it were scaling linearly). R-Tree generation scales fine (still
fits into memory), 1D index sorting is O(2n) and way processing
(including MBR calculation) is O(5n). That's probably due to the nodes
being sorted by id in the database so spatial locality is reduced.
Quicksort (used for sorting 1D indices) also doesn't use locality for
the outer iterations, but isn't a bottleneck yet.
Fitting as much as possible into physical RAM to avoid disk accesses was
the primary design goal (the last design was severly I/O-constrained).
There's still some room for improvement, but the performance gain for
the router is already quite impressive (compared to the previous
Python+PostgreSQL solution), though not entirely fair yet (only way type
and in-builtup-area are used, oneway and maxspeed are ignored; only
minimal postprocessing).
> I think that in most cases the top part of the tree should fit in
> memory and only the leaves (containing pointers/addresses to the
> actual items) will stay on disk during operation.
With the current node size of 8 entries, the overhead
(#non-leafs/#leafs) is about 14%, with all R-Tree structures
(leaf+non-leaf MBR, leaf pointers) totalling 128+32=160MB (space is
allocated exponentially) for europe, so it fits into memory quite nicely
(given "modern" desktop memory sizes, that is).
CU Sascha
--
http://sascha.silbe.org/
http://www.infra-silbe.de/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 481 bytes
Desc: Digital signature
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20081017/841b8679/attachment.pgp>
More information about the dev
mailing list