[OSM-dev] Proposal: Database accelerator and dirty tile marker based on simple algorithm.

Nick Hill nick at nickhill.co.uk
Sat Sep 16 16:40:06 BST 2006


I see a need for two additional dimensions to the underlying data 
representation.

1) A system to quickly determine if an area needs to be re-rendered. For 
example, if an update has been made to an area.

2) A system to speed the collection of data from the database.

Both of these goals can be achieved by representing the globe as virtual 
tiles.

The tile which any object sits on can easily be determined as follows, 
using only a few cheap CPU cycles, assuming lat and lon are 32 bit integers:

tile=(lat & 4294901760)+(lon>>16)

where & is bitwise and, and >> is right shift.

This will divide the world up into 6553.6uD tiles. On the equator, these 
would be 655x655meters. If the whole planet were to be represented, this 
would be 3.0174851x10E9 tiles of varying size. In reality, we'd probably 
ever get to a few tens of million tiles given that much of the earth is 
sea, arctic tundra and desert.

How will this help mark areas as updated?

Any given upload session for the API is likely to only affect a few 
tiles. Therefore, each change need update a few elements of an array. 
Once the editing session is over, these changes are flushed to just a 
few table rows with the new time stamp. (SQL definition and code below). 
A very cheap update.

When a bounding box is chosen to be rendered (either to vector or 
bitmap), the code can generate a list of tiles for the area and retrieve 
the latest update time. This will have a list of a few to a few dozen 
tiles which will be searched against an index. The lookup time will be 
in the order of milliseconds as no brute force searching will be 
required. The index will also be very small as it will only need to 
index one 32-bit int. Again, a very cheap SQL operation.

I have written a piece of PERL code which takes the bounding box as 4 
integer lat/lon values then returns an array listing which tiles are 
part of the bounding box. Takes around 7ms to perform the function:

Let's assume a box of:
float			INT (100nD)
51.42148913142311	514214891 (a)
51.45456595917263	514545659 (b)
-0.023866995603162036	-238669   (c)
0.010981117450765077	109811    (d)


---------------------
#!/usr/bin/perl
my $a=514214891;
my $b=514545659;
my $c=-238669;
my $d=109811;
my @tiles;

sub rounddown;
#we use divide rather than shift for the loop constructor to maintain 
sign bit
for (my 
$lat_top_16=rounddown($a/65536);$lat_top_16<=rounddown($b/65536);$lat_top_16++){
	for (my 
$lon_top_16=rounddown($c/65536);$lon_top_16<=rounddown($d/65536); 
$lon_top_16++){
		#reconstruct top 16 bits of lat and lon. Add lon with bits shifted down 16
		push (@tiles,($lat_top_16*65536)+(($lon_top_16*65536)>>16));
	}
}

sub rounddown{
#the int function rounds towards zero. We want to always round down.
my $value=shift;
if ($value < 0){$value-=0.9999999999;}
return int($value);
}
#@tiles is an array containing a list of tiles in the bounding box.
------------------
A simple SQL statement can present the most recent update in the area. 
If this most recent update is before the tile was rendered, no 
re-rendering is necessary.

+=+=+=+=
Collecting together data in database.
____________________________________

Either a 1:1 or 1:many relationship of object:tile could potentially 
speed up data retrieval. In a 1:1 relationship, an object could include 
a tile as an indexed field. In a 1:many relationship, we could have a 
table indexed by tile with a reference to an object on the other side.

All tiles a segment or way covers would be referenced with a single 
indexed look-up.


SQL Schemas for marking tiles dirty:

In the API after each update session
Before the connection closes

foreach $_ (keys %dirty){
execute_sql ("INSERT INTO dirty_tile (tile) VALUES ($currenttile) ON 
DUPLICATE KEY UPDATE tile=$currenttile;" )
}

Where dirty_tile table is defined by:
CREATE TABLE `dirty_tile` (
   `tile` int(11) unsigned NOT NULL default '0',
   `update_time` timestamp NOT NULL default CURRENT_TIMESTAMP on update 
CURRENT_TIMESTAMP,
   PRIMARY KEY  (`tile`),
   KEY `time_index` (`update_time`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;










More information about the dev mailing list