[Tilesathome] Upload permissions for distributed server

Lars Aronsson lars at aronsson.se
Sat Jul 21 02:12:42 BST 2007


Frederik Ramm wrote:

> > (You also can't locate their other uploads, so you can't do a 
> > "re-render where user=5")
> 
> How does Wikipedia deal with this problem?

In many different ways.  First, they have mechanisms in place to 
block a username and an IP address, to stop further vandalization 
from the same source.

Second, we must remember that the unit of change in Wikipedia is a 
single page edit or a single file upload.  Hopefully a user can 
only vandalize a few items before being discovered and blocked, so 
there is a limited amount to clean up.  In t at h a single upload (as 
shown in the contribution log) contains thousands of tiny images 
and vandalism can be hidden in a few tiles down at zoom=15.  It is 
highly unlikely that such vandalism can be spotted, unless it 
appears in every tile of the upload.

Third, in Wikipedia every page edit or uploaded file has a record 
in the version history, which is indexed by page/file name, by 
timestamp (for the Recent Changes log) and by the user who 
contributed that change.  The way tiles are stored on the t at h 
server, each user's contributions are spread out over very many 
places.  This makes the upload slow, having to update the indexes.  
It will also make clean-up slow, removing/reverting vandalism in 
many places, for every single upload.

There is an inherent complexity in the way t at h is designed, and 
that hurts us whatever we do.

I think what it boils down to is that tile files should be stored 
differently at the t at h server.  Today tiles are addressed so that 
z12 tile having x or y coordinate 200 is divided into z13 tiles 
400, 401 (the z12 address is shifted left, with 0 or 1 appended in 
the least significant bit position) and z14 tiles 800, 801, 802, 
803.  And z14 tile 200 is a child of z13 tile 100 and z12 tile 50.  
With increasing zoom depth, the number ranges get larger, but they 
are right-justified (just shift right as you zoom out).  Tiles 
that are geographically close and are rendered and uploaded 
together, have very different x and y coordinates at different 
zoom levels.  And still these x are used as subdirectory names.  
If instead x and y coordinates were left-justified, z14 tile 200 
would be a child of z13 tile 200 and of z12 tile 200 (don't shift, 
just mask out the least significant bit positions).

Today:

 z12   50                  200
 z13   100     101         400     401
 z14   200 201 202 203     800 801 802 803
 
Another way:

 z12   200                 800
 z13   200     202         800     802
 z14   200 201 202 203     800 801 802 803

In this "other way", there is no 801 tile at z13 or z12, because 
the increments are powers of two.  The benefit would be that all 
"2xx" tiles are uploaded in the same batch and can be removed in 
one strike.

It might be a little late to redefine x and y tile coordinates, 
since this is built into OpenLayers and everything.  However, we 
don't have to redefine them.  We can achieve the same benefit by 
just introducing another subdirectory level.  This is based on the 
fact that t at h focuses on level zoom=12 and down.  For every tile 
file access at z12..z18 (read or write), first compute what the 
parent x and y are at level z12.  Call these values the x12 and 
y12.  Now access x12/y12/z/x/y.png.  All files belonging to the 
same upload batch will be found under x12/y12/

An upload:

d=$x12/$y12.tmp/
mkdir $d
cd $d
unzip $zipfile
cd ..
mv $y12 $y12.previous
mv $y12.tmp $y12

A revert:

cd $x12/
mv $y12 $y12.vandalism
mv $y12.previous $y12
rm -rf $y12.vandalism &    # background job


-- 
  Lars Aronsson (lars at aronsson.se)
  Aronsson Datateknik - http://aronsson.se




More information about the Tilesathome mailing list