[OSM-dev] spatial index - B-Tree over z-order curves vs R-Tree over GIST

Thu Apr 9 12:00:20 BST 2009

Hi,

Matt Amos wrote:
> as marcus has pointed out the performance of 1-d indexes for 2-d data
>  is often poor. for further reading i recommend "Foundations of 
> Multidimensional and Metric Data Structures" by H. Samet 
> (http://www.cs.umd.edu/~hjs/) which is pretty comprehensive.

We had that discussion in breadth and length in late 2006/early 2007, 
where Nick Hill initially came up with the tiling stuff,

http://lists.openstreetmap.org/pipermail/dev/2006-September/002059.html

and was then reminded of the quadtile concept,

http://lists.openstreetmap.org/pipermail/dev/2006-September/002062.html

and so on. Nick seemed to have done a number of tests comparing B-tree
and R-tree performance (because at the time we were constantly told how
PostGIS would solve our performance issues), and at the time he wrote:

> I therefore contend: 
>
> 1) The data types for postgis are uneconomical.
> I contend that point data types using 1/4 of the storage can perform
> adequately, with +/- 5mm global accuracy.
> 
> 2) R-tree indexes, although theoretically being close to ideal for
> the geo problem domain have problems with their implementation.
> Lookups on R-tree appear to be much more fragmented than look-ups on
> b-tree, resulting in lots of costly disk seeks, or requiring them to
> be cached in RAM. (Both above issues are soluble).
> 
> I also contend that
> 
> 3) B-tree lookups on lat/lon are theoretically inefficient. Only one
> of either lat/lon are used as an index range lookup. The second is
> looked up through a brute force search. However, the look-up on the
> first column is extremely efficient. In practice, the records
> narrowed from the first column needing brute force search to narrow
> the second column, are actually performed quickly, with few
> additional disk seeks. I don't have an explanation for it's apparent 
> speed apart from the maturity of b-tree and widespread efforts to
> counteract the shortcomings of b-tree with clever optimisations for
> 2-column arrangements. Ideally, we need to dispose of that
> requirement to brute force search without introducing unacceptable
> overheads, and the brute force search does impose scalability
> concerns.
> 
> In summary, the postGIS system appears to have a lot going for it,
> and feel there are opportunities being lost with OSM not sharing the
> same data format with other free GIS initiatives. At the same time,
> my tests using MySQL have shown that many of the theoretical
> performance benefits of the postGIS system are just that -
> theoretical, and genuinely look forward to being proven wrong on 
> this. I also think that the theory and practice of PostGIS can be 
> brought closer together with further development and refinement. If
> OSM used postGIS, that could help development of postGIS. On the
> other hand, if OSM used postgis, it may delay or prevent better
> systems developed through OSM seeing the light of day.

(This is from 
http://lists.openstreetmap.org/pipermail/talk/2007-January/010166.html).

Bye
Frederik