[OSM-talk] Re: Data types; indexes etc.

Thu Apr 27 16:47:38 BST 2006

dblasby at openplans.org wrote:
> Nice wrote:
> 
>>I have just performed tests on a 1M point field for the three data
>>types. The averaged results for 25,600 point tiles as follows:
>>
>>Double float 64 0.51s
>>Single float 32 0.33s
>>Integer32 0.237s
> 
> 
> What does 25,600 point tiles mean?  Does that mean you're storing 40
> points in each row of your table?

With a random bit field of x points, selecting a tile size of 25600/x of 
the entire bit field. Therefore, each query will return an average of 
25,600 points.

> 
> I'm still surprised that int32s are >30% faster than float32s - you'd
> think the math processor would evaluate a<b in 1 cycle no matter what
> the data type was. 

Look at it another way, if one function takes 10 logical steps, and the 
other 3, It would be a bad processor design to waste 7 clocks just so 
they take equal time. IMHO, integers will always be faster than float 
unless speed and thermodynamic efficiency cease to be a factor in 
processor design.

> It makes sense that the 64bit datatypes are slower
> just because you're going to have to fetch twice as much data from
> memory (not to mention your fast CPU cache memory will only hold 1/2
> the data, and it will take twice as long to load 64bits into a
> register).
> 
> Still, the difference is only 0.1 seconds.

Absolute timing is irrelevant.

The amount of machine time used to serve a given request is relevant. If 
something takes 0.01 seconds, that is far better, as you would need 
1/10th of the number of machines to serve the same requests.

When OSM really takes off (like wikipedia), we will need possibly 
hundreds of processors, and these ecomomies will make a vast difference.

>>I wouldn't recommend using single floats for geographic
>>representations; the rounding errors would be intolerable. Literally
>>miles out.
>>http://docs.sun.com/source/806-3568/ncg_goldberg.html#689
> 
> 
> I think you misunderstood me -- I was just talking about using float32
> bounding boxes in your index (although you could also make a
> BOX2D_INT32SCALED to use in the index).  I'd suggest you use doubles
> (or your *10,000,000 integer representation) for your actual data.

I do understand, I am just making the point that although I run the 
tests using single float type, I don't recommend using it for real life 
geographic data.

I understand that using the single float type for bounding boxes is very 
different to using it for the actual data in the leaf nodes.

> As I asked before - why are you storing each point in the database?  Why
> dont you store edges (homogeneous lines)?  It seems this would make
> your database much more efficient if you did this.  But, I dont really
> know what your datamodel is.

Steve answered this. Although I don't understand the rationale either way.