[Tilesathome] Proposal: New T at H Server structure

Florian Lohoff flo at rfc822.org
Sun Jun 1 19:13:24 BST 2008


On Sun, Jun 01, 2008 at 07:17:03PM +0200, Gerhard Schmidt wrote:
> > Every request for a tile going through a database sounds horrible from
> > the point of speed in my eyes.
> 
> It is not if some basic rules are followed. As stated in some other
> Mails I've done something like that already with very high performance.
> 
> 1. You have to pool your MySQL connection (opening an authentication
> takes the most time)
> 2. all columns in the where clause have to be in one index
> 3. keep it to one max 2 simple queries (no joins or views)

The point is: Why do so when it is not necessary?

> > I'd wish the redirector could easily determin the location by e.g.
> > hashing the tiles x and y taking the first 2 bytes e.g. 2^16 chunks
> > of data and deciding which tileserver contains this chunk of tiles the
> > mechanism of this could even be embedded into the map javascript code
> > so no need for a redirector.
> 
> You break every usage of the Tiles without JavaScript this way.

Why? Why not provide a central proxy serving the complete content by
doing the url transformation for non hash aware clients?

> > So in the end you split 2^16 tiles to n machines - you now only need a
> > map with 2^16 entries describing where to find this location. The point
> > in using a hash is a better load and data distribution or even much more
> > hackish - put the map into the DNS e.g.
> > 
> > th0000.ts.informationfreeway.org
> > th0001.ts.informationfreeway.org
> > ...
> > thffff.ts.informationfreeway.org
> 
> Updates to DNS takes ages to be distributes as some DNS servers ignore
> the given ttl an cache far longer than that (most of the Large ISPs do
> that) I have done something like hat too and hit the wall quite fast.

I am working for the 2nd largest ISP in Germany and we dont do this. I
do have a cut off in the TTL to stop cache pollution with bogus records
for too long. I know that nscd for example ignores TTLs completely which
is the reason for the DNS part to be disabled most of the time.

But again - serving unknown tiles (read: tiles not matching "our" hash) 
by redirecting would be an option.

> > Now requesting the tile is made by the javascript code by running e.g.
> > an MD5 or even cheaper hash over the tiles x and y and using the data
> > to construct the url. Adding more machines would mean put them into the
> > cluster, deciding which chunks of data to move, copying the data and
> > once done switch the DNS over to the new machine. Adding more power
> > to a single chunk could be done with multiple ip addresses to the dns
> > entry. DNS is most likely one of the oldest distributed data systems
> > which has proven to work very reliable.
> 
> why hashing xyz is already an hash for each tile. You don't gain
> anything by running md5 over them.

No - its a geographical index so adjacent tiles are on the same server
which is bad as having hotspots like "germany" with the last press
coverage will likely only hit one/few servers. Hashing will distribute tiles
on ALL machines just as good as the hashs distribution.

> as Noted by someone OS doesn't have some spare servers in the Attic so
> the new system has to run on the given hardware but must be able to
> scale if need and new hardware is available. (which my system will do)
> as all 3 Parts can be run on the same Machine at first. Still solving
> the Upload Problem be eliminating the need to replace tiles in place.

The hash dns stuff could run on the same machine too
 
> > From what i understood the bootleneck is currently disk i/o or better
> > metatadata io as the files get unpacked. This could either be solved
> > by distributing on different filesystems or even on different machines.
> 
> I think the main problem is replacing one file in a directory populated
> by a large number of files. Which will my system solve because ever
> upload get its own directory (and n subdirectories to prevent them from
> getting to large)

Metadata is always expensive as it costs the valuable seeks - unless all
your metadata fits into your memory - having large directories has been
a killer for nearly all unix filesystems built in the last 35 years.

Shifting the metadata update from the unix filesystem into a database 
will probably not gain anything.

Flo
-- 
Florian Lohoff                  flo at rfc822.org             +49-171-2280134
	Those who would give up a little freedom to get a little 
          security shall soon have neither - Benjamin Franklin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.openstreetmap.org/pipermail/tilesathome/attachments/20080601/93b059f1/attachment.pgp>


More information about the Tilesathome mailing list