[OSM-dev] Pluggable storage backends for mod_tile / renderd
kakrueger at gmail.com
Sun Mar 24 01:30:21 UTC 2013
I just wanted to let you know that I have committed a new feature to
mod_tile / renderd that might be of interest to some.
So far both mod_tile and renderd had the assumption that (meta)tiles
would be stored on a posix filesystem deeply baked into their
architecture. While the posix file system it self is a rather flexible
pluggable storage system with many different backends you can choose
from, (e.g. from temporary memory based filesystem, over traditional
local filesystems to network filesystems to cluster filesystems to
arbitrary usermode filesystems) with growing diversity in osm based
tileserver setups, this might not be enough.
I have therefore refactored the code of mod_tile and renderd to cleanly
separate out all storage access routines into a separate API, which
can then easily be implemented by various different storage backends and
In the initial commit I have implemented 3 different backends so far.
1) Posix file system storage backend: This backend is equivalent to how
mod_tile and renderd have so far worked. It uses any posix compliant
file system to store metatiles.
2) RADOS ( Reliable Autonomic Distributed Object Storage) : Rados is
a cluster based large scale distributed and redundant object storage,
which is part of the ceph project. It allows to store objects across
many servers allowing it to scale up to many terrabyte of data to very
high performance on commodity hardware.
3) memcached: Memcached is a pure memory based storage system which
allows for very high performance. As it can also cluster many servers
together, it can create a large scale memory storage.
I very much suspect that the majority of single server tileserver setups
will continue to use the posix filesystem backend as it is the easiest
to use, most robust and for local setups possibly also the fastest
thanks to built in OS level caching. However, for those setups that need
to scale up to multi tileserver setups, be it for performance or for
redundancy / high availability reasons other backends like the cluster
based rados backend might be preferable.
Renderd has supported multi-server distributed rendering for a long time
now, in which a central renderd performs queueing and request
de-duplication and then passing those requests on to an arbitrary number
of slave renderers who do the actual rendering. Recently mod_tile has
gained the ability to talk to renderd via a tcp socket (previously it
was limited to a unix domain socket), so mod_tile and renderd can now
also live on different servers than renderd and multiple mod_tile
servers can connect to the dirstributed renderd infrastructure. This
should make all components of the tile rendering stack arbitrarily and
separately scalable and redundant, allowing for very large scale
deployments of many thousand requests per second using mod_tile.
However, the setup requires all components to have a shared view to the
storage backend. While this was previously already possible using
cluster based posix filesystems like glusterFS or using central network
attached storage, hooking more directly into a cluster storage like
rados might be beneficial.
In the future hopefully further backends will be implemented. I can
think of a couple of potential interesting candidates. For example one
could implement a MBTiles backend, or a backend that stores tiles on
Amazon S3, or a backend that combines the various other backends in a
hierarchical way like a combination of memcached for local caching and
Amazon S3 for large scale storage.
To remain compatible with tirex, I hope to port over this feature to
tirex as well, at least to the mapnik render backend.
For the moment the rados and memcached backends should still be
considered somewhat experimental until more thorough testing can be
conducted, but they should already be fully functional.
Any comments, suggestions, improvements or bug reports are welcome.
More information about the dev