[Tilesathome] Upload permissions for distributed server
Lars Aronsson
lars at aronsson.se
Sat Jul 21 02:12:42 BST 2007
Frederik Ramm wrote:
> > (You also can't locate their other uploads, so you can't do a
> > "re-render where user=5")
>
> How does Wikipedia deal with this problem?
In many different ways. First, they have mechanisms in place to
block a username and an IP address, to stop further vandalization
from the same source.
Second, we must remember that the unit of change in Wikipedia is a
single page edit or a single file upload. Hopefully a user can
only vandalize a few items before being discovered and blocked, so
there is a limited amount to clean up. In t at h a single upload (as
shown in the contribution log) contains thousands of tiny images
and vandalism can be hidden in a few tiles down at zoom=15. It is
highly unlikely that such vandalism can be spotted, unless it
appears in every tile of the upload.
Third, in Wikipedia every page edit or uploaded file has a record
in the version history, which is indexed by page/file name, by
timestamp (for the Recent Changes log) and by the user who
contributed that change. The way tiles are stored on the t at h
server, each user's contributions are spread out over very many
places. This makes the upload slow, having to update the indexes.
It will also make clean-up slow, removing/reverting vandalism in
many places, for every single upload.
There is an inherent complexity in the way t at h is designed, and
that hurts us whatever we do.
I think what it boils down to is that tile files should be stored
differently at the t at h server. Today tiles are addressed so that
z12 tile having x or y coordinate 200 is divided into z13 tiles
400, 401 (the z12 address is shifted left, with 0 or 1 appended in
the least significant bit position) and z14 tiles 800, 801, 802,
803. And z14 tile 200 is a child of z13 tile 100 and z12 tile 50.
With increasing zoom depth, the number ranges get larger, but they
are right-justified (just shift right as you zoom out). Tiles
that are geographically close and are rendered and uploaded
together, have very different x and y coordinates at different
zoom levels. And still these x are used as subdirectory names.
If instead x and y coordinates were left-justified, z14 tile 200
would be a child of z13 tile 200 and of z12 tile 200 (don't shift,
just mask out the least significant bit positions).
Today:
z12 50 200
z13 100 101 400 401
z14 200 201 202 203 800 801 802 803
Another way:
z12 200 800
z13 200 202 800 802
z14 200 201 202 203 800 801 802 803
In this "other way", there is no 801 tile at z13 or z12, because
the increments are powers of two. The benefit would be that all
"2xx" tiles are uploaded in the same batch and can be removed in
one strike.
It might be a little late to redefine x and y tile coordinates,
since this is built into OpenLayers and everything. However, we
don't have to redefine them. We can achieve the same benefit by
just introducing another subdirectory level. This is based on the
fact that t at h focuses on level zoom=12 and down. For every tile
file access at z12..z18 (read or write), first compute what the
parent x and y are at level z12. Call these values the x12 and
y12. Now access x12/y12/z/x/y.png. All files belonging to the
same upload batch will be found under x12/y12/
An upload:
d=$x12/$y12.tmp/
mkdir $d
cd $d
unzip $zipfile
cd ..
mv $y12 $y12.previous
mv $y12.tmp $y12
A revert:
cd $x12/
mv $y12 $y12.vandalism
mv $y12.previous $y12
rm -rf $y12.vandalism & # background job
--
Lars Aronsson (lars at aronsson.se)
Aronsson Datateknik - http://aronsson.se
More information about the Tilesathome
mailing list