[OSM-dev] OSM Operations Challenge: Tile CDN QoS

Grant Slater openstreetmap at firefishy.com
Wed Oct 15 22:53:18 UTC 2014

Hi All,

I am part of the OpenStreetMap sysadmin team...

I am involved with running of the tile.openstreetmap.org CDN.

Currently we have 2 perpetually overloaded rendering servers (orm & yevaud)
The render servers are fronted by a collection of caching servers
distributed all over the globe. See:
Clients are directed to the cache servers by "Geo" DNS. (PowerDNS geo backend)
We use squid 2.7 (ancient) for caching with squid delay pool (Per IP
token bucket) to slow down mass-downlowders / abusers who would
normally degrade the service for everyone.

Squid delays pools is basic token bucket implementation... Each client
IP is allocated a bucket, the client's download rate drains the
bucket... bucket is topped by at a slow rate. Once bucket is drained a
client IP cannot exceed the top-up rate.

I am working on a rewrite of the CDN caching layer to use modern
varnish cache (4.0).

I have also been looking at moving the token bucket abuse management
to linux QoS.

I'm working toward something like:

1) varnish + libvmod-vsthrottle (
https://github.com/daghf/libvmod-vsthrottle ) triggers a log event for
a client with an excessive request rate.
2) log monitor fires off tc to switch client to a rate limited tc qdisc.
3) After x minutes log monitor resets client back to default qdisc.

Basic tc script example: https://gist.github.com/Firefishy/9464cfa4f6b8bec2644a

Anyone with deep knowledge of varnish and/or linux QoS?
Any suggestions? Tips?

Happy to discuss here on list or on IRC Firefishy in #osm-dev (oftc)

Kind regards,


More information about the dev mailing list