[OSM-talk] Server slowness

Nick Hill nick at nickhill.co.uk
Mon Jan 15 11:45:42 GMT 2007


I echo Richard's post.

But I would add the rider that the current OSM set-up is by no means optimised. 
I estimate organising the data on disc according to geographic location, and 
partitioning the database can improve node look-ups by an order of magnitude. 
Therefore, a direct comparison should implement partitioning.

Notwithstanding, I fully concur with previous posts that if OSM data model were 
shared with with the rest of the open GIS community, many benefits would be 
derived.

Everything else being equal, even if adopting a standard GIS model would cause a 
slight degradation in performance, the cost would be worthwhile.

The aims of other free and open GIS seem to be designed for problem spaces 
different to OSM. Not in terms of data description but in terms of mass 
distribution of GIS data. The design is by no means economical. When I tested 
the MySQL implementation, A node consists of a 64 byte bounding box. When I 
queried areas for nodes, performance was low, I/O was high.

This led me to conclude the data used in the R-tree lookup path is not as 
optimised as B-tree. B-tree being much more mature, having many optimisations 
which R-tree doesn't have. (At this point, I can imagine many people who know 
about the problem domains R-tree and B-tree indexes are supposed to solve 
pointing out to me how an R-tree is ore appropriate for geo data. I agree. But 
that is not my contention).

I therefore contend:
1) The data types for postgis are uneconomical. I contend that point data types 
using 1/4 of the storage can perform adequately, with +/- 5mm global accuracy.

2) R-tree indexes, although theoretically being close to ideal for the geo 
problem domain have problems with their implementation. Lookups on R-tree appear 
to be much more fragmented than look-ups on b-tree, resulting in lots of costly 
disk seeks, or requiring them to be cached in RAM.
(Both above issues are soluble).

I also contend that

3) B-tree lookups on lat/lon are theoretically inefficient. Only one of either 
lat/lon are used as an index range lookup. The second is looked up through a 
brute force search. However, the look-up on the first column is extremely 
efficient. In practice, the records narrowed from the first column needing brute 
force search to narrow the second column, are actually performed quickly, with 
few additional disk seeks. I don't have an explanation for it's apparent speed 
apart from the maturity of b-tree and widespread efforts to counteract the 
shortcomings of b-tree with clever optimisations for 2-column arrangements. 
Ideally, we need to dispose of that requirement to brute force search without 
introducing unacceptable overheads, and the brute force search does impose 
scalability concerns.


In summary, the postGIS system appears to have a lot going for it, and feel 
there are opportunities being lost with OSM not sharing the same data format 
with other free GIS initiatives. At the same time, my tests using MySQL have 
shown that many of the theoretical performance benefits of the postGIS system 
are just that - theoretical, and genuinely look forward to being proven wrong on 
this. I also think that the theory and practice of PostGIS can be brought closer 
together with further development and refinement. If OSM used postGIS, that 
could help development of postGIS. On the other hand, if OSM used postgis, it 
may delay or prevent better systems developed through OSM seeing the light of day.










Richard Fairhurst wrote:
> Quoting Martin Spott <Martin.Spott at mgras.net>:
> 
>> You probably should have a closer look at PostGIS, especially at the
>> capabilities regarding geospatial queries, and you're likely to be
>> pleased. PostGIS' strength lies in much more than just serving as data
>> exchange and storage
> 
> Most of the OSM data is available (planet.osm), as is all of the  
> source. Many OSM developers are busy with other parts of the project  
> at the moment, so if you can provide some benchmarks to show that OSM  
> really does run faster on a PostGIS setup, and show what changes you  
> made to achieve this, we'd be all ears.
> 
> cheers
> Richard
> 
> 
> _______________________________________________
> talk mailing list
> talk at openstreetmap.org
> http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/talk
> 




More information about the talk mailing list