[OSM-talk] osmarender tile server broken?

Nick Hill nick at nickhill.co.uk
Sat Feb 24 19:29:44 GMT 2007


In general, I agree with a function to map x,y,z to an int. However, the 
cardinality of the tiles index doesn't seem too bad, given the x appears to be 
an integer <130,000.

For large numbers of small files, databases are probably efficient so long as 
the record size is fixed.

I see the biggest problem with the current T at H set-up is the database records 
are variable length, and the length of each record does vary regularly. That 
creates a computational problem of how to store and retrieve such data 
efficiently and adds workload when a larger record replaces a smaller record.

We also need to consider how different file systems behave with large numbers of 
small files and file size changes/ possibility of fragmentation.

As an example of different file system approaches; some file systems use linked 
lists for directory look-ups. These are efficient for smaller number of 
directory entries, inefficient for large numbers of entries. Other file systems 
index directories using b-trees or b+ trees providing good performance even with 
very large numbers of files per directory. Some file systems store small files 
completely in metadata rather than assigning wasteful clusters.

My tests on file systems show JFS or reiserFS as a much better file system than 
either ext2 or ext3 when larger numbers of small files are being stored and 
retrieved.

In general, both database and file system approaches have benefits and 
drawbacks. There is a balancing act.

Currently, T at H tiles database has 23m records. I haven't tried copying/backing 
up 23m files. I wonder whether most tools will cope.

SteveC wrote:
> mapnik is entirely in the db without trouble. From talking to ppl on irc 
> it seems that t at h breaks when lots of people are uploading as they hog a 
> mysql handle(s).

It could be that MySQL is taking a long time to store records when the record 
length changes. Fragmentation/ many levels of redirection.

> 
> I'd first reduce the http keepalive to 1 second or something so the 
> handles arnt open for too long. Second instead of uploads going straight 
> to the db, throw them to disk and process them with a cron job so that 
> insertion only ever takes one mysql handle.
> 
> Third thing is don't use x,y,z columns as there's not any cardinality 
> between them, use a single int column with some function(x,y,z) => i 
> that can go both ways. Will make the index far more useful. This is what 
> I'll do with mapnik.




More information about the talk mailing list