[OSM-dev] tiles at home disk usage

Frederik Ramm frederik at remote.org
Wed May 2 10:54:53 BST 2007


> Server code is all in SVN if people want to see what's behind-the- 
> scenes.
> Tile metadata and access stats available to download if you want to  
> model
> different storage strategies.

I think the general way to go is this:

* Store only tiles that have something on them as individual PNG files.
* For "empty" tiles, store only the land/sea information.
* If a tile is requested, return an individual PNG file if one is  
there, or a generic "empty" tile, either blue or white, otherwise.

The questions remaining are:

1. How do we store blue-empty and white-empty information on the  
server and deliver approriate PNG files,
2. How do we determine if a tile is blue-empty or white-empty.

Storing blue-empty or white-empty information does of course happen  
in the database - we will need an extra column that can handle one of  
three values: "normal tile", "empty sea tile", "empty land tile".  
Retrieval of this information is possible in two ways:

1a. create symlinks in the directory structure linking to the empty  
blue or empty white tiles where appropriate; no further work required  
(enable symlinks in Apache; caution when overwriting tiles - delete  
first, otherwise you overwrite the link destination). Drawback:  
potential huge number of symlinks in the file system (roughly 70% of  
all entries).

1b. use Apache's "ErrorDocument" directive to execute a CGI script  
whenever a requested tile is not found. Have that script query the  
database and return a "Location" header pointing either to the empty  
blue or empty white tile. If the requested tile is on a level greater  
than 12, and the database does not contain a record for it, answer  
based on the information for the enclosing level-12 tile. Drawback:  
potential performance problems (the old "too many mysql connections"  
if someone surfs across the Atlantic at level 12).

We can even combine 1a and 1b, so that for some tiles near the coast  
we provide symlinks (as they will be viewed often - fast access, less  
strain on the server), and use the database lookup as a last resort.

Determining wheter a tile is empty is something that can be done by  
the entity creating the tile.

2a. tilesGen.pl, which creates all level-12 and higher tiles, already  
detects empty-land tiles and uploads a 67-byte dummy PNG instead. It  
should be improved to upload 69-byte dummy PNGs for empty-sea as  
well, so that the server can recognize that is is an empty tile.  
(Instead of communitcating via byte sizes, an XML "meta data file"  
could be envisaged, but that can also be done as a tidy-up step later.)

2b. lowzoom.pl, which creates all level-11 and lower tiles, needs to  
be beefed up to detect empty tiles as well, and upload appropriate  
dummy PNGs. It would be desirable to have some sort of "database  
access", so that lowzoom.pl could, before it commences download of  
individual tiles for constructing lowzoom tiles, download the meta  
information for the tiles it is about to process, and then only  
download those tiles that carry information (not the empty-blue or  
empty-white tiles).

The server side script that accepts uploads would have to be modified:

3a. for 67-byte or 69-byte tiles, delete the existing tile and  
replace by a symlink according to 1a; possibly, if a mix of 1a and 1b  
is to be run, apply some logic to determine whether a symlink should  
be created.

When we implement these mechanisms, we need to clean up the tile  
database and also generate some information:

4a. check the database for all existing empty blue tiles, delete the  
PNGs (optionally replacing them by a symlink) and flag them "empty  
sea" in the database.
4b. the same for exiting empty white tiles (are there any?)
4c. determine all empty sea tiles from Martijn's level-12 index, and  
add "empty sea" entries to the database for them if they do not exist  
already. Also, add "empty sea" entries for all ocean tiles on lower  
levels, as computed from existing data (a tile on level n is "empty  
sea" if all its four sub-tiles in level n+1 are).

Step 4a will free something like 7 GB of data currently used by empty  
sea tiles. Step 4c will create an extra 10m records in the database  
(currently containing about 27m records).

Any comments on this, or can I implement it?


Frederik Ramm  ##  eMail frederik at remote.org  ##  N49°00.09' E008°23.33'

More information about the dev mailing list