[Tilesathome] Tiles at home unavailable all Monday

OJW streetmap at blibbleblobble.co.uk
Thu May 24 18:55:57 BST 2007


Last weekend, the dev server was unresponsive because it physically couldn't 
write information to the disk quickly enough.  The code was running fine, but 
with 20000000 images per day inbound and 3800000 images per day outbound, 
both of which can require database connections.

So far, the only suggestions for dealing with this have been "record less 
data".  Which LA2 and I did on sunday/monday (didn't have any effect, because 
the post filesize limit is currently too small to upload a full tileset 
anyway). 

Does anyone even understand the complexity we're introducing into the system 
here?  Tiles can now come from any number of places (disk, old disk, 
database, recursive search in the database). 

Meta-information will become unreliable (are you looking at a real tile, or at 
a blank tile, or is it the result of a recursive search?)  Anyone trusting 
the "_details" query will probably get the wrong answer, and the _details 
page will probably get about 3x more complex before it's working again. 

Someone added code which assumes that _details can look at z12 for 
information. Is that right? Who knows.  Can blank tiles be overridden?  Can 
they be merged together?  How high do you search in the zoom levels? Is the 
_details page being updated with the latest "emergency fixes" to mod_rewrite 
rules, so that you know the author/date returned is correct for the file.

Is the downloadable list of tiles going to know any of this? Is someone going 
to write an exporter for the list of blank tiles? Do we have enough CPU to 
run it?  Do all the scripts which use those lists know that meta-information 
now [sometimes] applies to all sub-tiles?

Oh, and we don't yet delete images when someone uploads a blank tile.  Yet 
another script to write and maintain.

Since we started rendering sea tiles, the load on the dev server increased 
massively.  Not ony that, the number of tiles required the "emergency" disk 
move with added complications.

And even debugging information is starting to annoy people.  

Does anyone understand the difference between a white tile which was rendered 
as such, a white tile that wasn't loaded yet, a white tile because it's blank 
land, a white tile that was 404, a white tile that was because the database 
connection wasn't available, a white tile because some recursive search 
returned blank land? Or was it imported from the oceantiles.dat locally 
without any usernames or dates?

Because without knowing for sure every single detail of that process, then the 
system will give us incorrect results.  Like when tilesgen uploads blank 
tiles in the middle of the sea - is anyone debugging that?  Would it be any 
easier to debug if we removed the error-message tiles, or displayed 
out-of-date tiles from the old disk?

So in summary, it's getting pretty hectic, mostly because we're implementing 
all sorts of optimisations that sound good when you write suggest them in a 
quick "why don't you..." email, but are more difficult to analyse all their 
consequences.  Especially when you spent all evening implementing the fix, 
while the person who suggested it is busy with their next email.  

"Why don't you have a list of trusted users, and not bother recording details 
from their uploads". Great. How will the exported list of tile timestamps 
cope with that?

"Why don't you _just_ let people FTP tilesets in, and scan the directory". 
again great idea but show us the code or stop complaining when I don't write 
it in the evening between answering emails.  Anyone written the security for 
that system yet? How do I know for sure who it was uploaded by? Last time I 
checked, we don't even have the PHP code on SVN to scan directories for such 
uploads.  And do you even know that it will work?  The disk is running as 
fast as it's physically able, and having a queue means backlogs that increase 
indefinitely.  

I'm not entirely sure how all this will change when we have 5 admins, but 
problems due to the complexity aren't going to be any easier when people are 
all adding things in the hope of making it go faster at the expense of a 
correct result.

Regards,

OJW








More information about the Tilesathome mailing list