[OSM-dev] on-disk indexing of geodata

Sascha Silbe sascha-ml-gis-osm-dev at silbe.org
Fri Oct 17 15:40:45 BST 2008

On Fri, Oct 17, 2008 at 03:25:59PM +0200, Freek wrote:

> Generally you keep a bit of free space in all nodes to accommodate a 
> number of insertions and deletions, and only split or merge nodes when 
> they overflow or underflow so that you don't need to move around all 
> kinds of elements all the time.
But how do you do those splits, i.e. node inserts? Just moving 
everything behind the insertion point by one block would be way too 
expensive for large trees, so there's probably a better method.

>> I haven't done any research on that topic, though, since I don't need 
>> it yet (bulk
>> imports are quite fast - ~25 minutes for the 865MB europe.osm.bz2).
> That's not bad, I suppose your index still fits in memory?
Yup. I mmap() the whole database. For europe it's less than twice the 
size of my physical memory (5.3GB vs. 3.5GB), so most of it fits into 
memory. For whole planet imports it gets much slower (5.6h instead of 
2.5h if it were scaling linearly). R-Tree generation scales fine (still 
fits into memory), 1D index sorting is O(2n) and way processing 
(including MBR calculation) is O(5n). That's probably due to the nodes 
being sorted by id in the database so spatial locality is reduced. 
Quicksort (used for sorting 1D indices) also doesn't use locality for 
the outer iterations, but isn't a bottleneck yet.
Fitting as much as possible into physical RAM to avoid disk accesses was 
the primary design goal (the last design was severly I/O-constrained). 
There's still some room for improvement, but the performance gain for 
the router is already quite impressive (compared to the previous 
Python+PostgreSQL solution), though not entirely fair yet (only way type 
and in-builtup-area are used, oneway and maxspeed are ignored; only 
minimal postprocessing).

> I think that in most cases the top part of the tree should fit in 
> memory and only the leaves (containing pointers/addresses to the 
> actual items) will stay on disk during operation.
With the current node size of 8 entries, the overhead 
(#non-leafs/#leafs) is about 14%, with all R-Tree structures 
(leaf+non-leaf MBR, leaf pointers) totalling 128+32=160MB (space is 
allocated exponentially) for europe, so it fits into memory quite nicely 
(given "modern" desktop memory sizes, that is).

CU Sascha

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 481 bytes
Desc: Digital signature
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20081017/841b8679/attachment.pgp>

More information about the dev mailing list