[OSM-talk] Re: Data types; indexes etc.
Nick Hill
nick at nickhill.co.uk
Thu Apr 27 16:47:38 BST 2006
dblasby at openplans.org wrote:
> Nice wrote:
>
>>I have just performed tests on a 1M point field for the three data
>>types. The averaged results for 25,600 point tiles as follows:
>>
>>Double float 64 0.51s
>>Single float 32 0.33s
>>Integer32 0.237s
>
>
> What does 25,600 point tiles mean? Does that mean you're storing 40
> points in each row of your table?
With a random bit field of x points, selecting a tile size of 25600/x of
the entire bit field. Therefore, each query will return an average of
25,600 points.
>
> I'm still surprised that int32s are >30% faster than float32s - you'd
> think the math processor would evaluate a<b in 1 cycle no matter what
> the data type was.
Look at it another way, if one function takes 10 logical steps, and the
other 3, It would be a bad processor design to waste 7 clocks just so
they take equal time. IMHO, integers will always be faster than float
unless speed and thermodynamic efficiency cease to be a factor in
processor design.
> It makes sense that the 64bit datatypes are slower
> just because you're going to have to fetch twice as much data from
> memory (not to mention your fast CPU cache memory will only hold 1/2
> the data, and it will take twice as long to load 64bits into a
> register).
>
> Still, the difference is only 0.1 seconds.
Absolute timing is irrelevant.
The amount of machine time used to serve a given request is relevant. If
something takes 0.01 seconds, that is far better, as you would need
1/10th of the number of machines to serve the same requests.
When OSM really takes off (like wikipedia), we will need possibly
hundreds of processors, and these ecomomies will make a vast difference.
>>I wouldn't recommend using single floats for geographic
>>representations; the rounding errors would be intolerable. Literally
>>miles out.
>>http://docs.sun.com/source/806-3568/ncg_goldberg.html#689
>
>
> I think you misunderstood me -- I was just talking about using float32
> bounding boxes in your index (although you could also make a
> BOX2D_INT32SCALED to use in the index). I'd suggest you use doubles
> (or your *10,000,000 integer representation) for your actual data.
I do understand, I am just making the point that although I run the
tests using single float type, I don't recommend using it for real life
geographic data.
I understand that using the single float type for bounding boxes is very
different to using it for the actual data in the leaf nodes.
> As I asked before - why are you storing each point in the database? Why
> dont you store edges (homogeneous lines)? It seems this would make
> your database much more efficient if you did this. But, I dont really
> know what your datamodel is.
Steve answered this. Although I don't understand the rationale either way.
More information about the talk
mailing list