[OSM-dev] Further Database optimisation and data size
nick at nickhill.co.uk
Thu Apr 20 18:03:11 BST 2006
It seems the data is currently stored in double precision floating
point. This takes 8 bytes of data for each axis of the co-ordinate
making 16 bytes of 2d positional data for each point.
If we were to express latitude and longitude as an integer on the scale
of either -1,790,000,000 to +1,800,000,000 or 0 to 3,599,999,999 we
could halve the data necessary to store each co-ordinate. The conversion
would simply be multiplying or dividing by 10,000,000 to give a degree
co-ordinate. The database will store positions in 10 millionths of a
degree intervals as an integer.
The down side of ths is that our precision will slip from double
precision 64 bit resolution to 32 bit integer resolution.
The worst effect of this will be at the equator, where the circumfrence
of the earth is approximately 40million meters.
If we sub-divide 40million meters with a resolution of 3,600,000,000
sub-units, we end up with a sample interval of 1.1 centimeters. This
results in a maximum sample error of +/- 5.05 millimeters instead of a
I tentatively suggest that for the sake of halving our co-ordinate data
set size, we
1) Sample to the nearest 5.05 millimeters. I cannot imagine this level
of inaccuracy being a problem for the time being.
2)Move to an integer representation of the co-ordinates within the
database then carry out any necessary conversion either within the SQL
select query itself, or at the API level.
I will perform some timings of SQL selects on a randomised data set to
check the effect on MySQL.
I also have an additional database optimisation, which could potentially
reduce on-line database memory needs by an order of magnitude.
More information about the dev