[OSM-dev] Further Database optimisation and data size

Thu Apr 20 18:03:11 BST 2006

It seems the data is currently stored in double precision floating 
point. This takes 8 bytes of data for each axis of the co-ordinate 
making 16 bytes of 2d positional data for each point.

If we were to express latitude and longitude as an integer on the scale 
of either -1,790,000,000 to +1,800,000,000 or 0 to 3,599,999,999 we 
could halve the data necessary to store each co-ordinate. The conversion 
would simply be multiplying or dividing by 10,000,000 to give a degree 
co-ordinate. The database will store positions in 10 millionths of a 
degree intervals as an integer.

The down side of ths is that our precision will slip from double 
precision 64 bit resolution to 32 bit integer resolution.

The worst effect of this will be at the equator, where the circumfrence 
of the earth is approximately 40million meters.

If we sub-divide 40million meters with a resolution of 3,600,000,000 
sub-units, we end up with a sample interval of 1.1 centimeters. This 
results in a maximum sample error of +/- 5.05 millimeters instead of a 
few nanometers.

I tentatively suggest that for the sake of halving our co-ordinate data 
set size, we
1) Sample to the nearest 5.05 millimeters. I cannot imagine this level 
of inaccuracy being a problem for the time being.

2)Move to an integer representation of the co-ordinates within the 
database then carry out any necessary conversion either within the SQL 
select query itself, or at the API level.

I will perform some timings of SQL selects on a randomised data set to 
check the effect on MySQL.

I also have an additional database optimisation, which could potentially 
reduce on-line  database memory needs by an order of magnitude.