[Tile-serving] Benchmarking osm.xml

Wed May 15 00:45:19 UTC 2013

This message is to set out my methodology for the upcoming carto results
(which are now running, with real numbers!)

*** If all you care about is osm-carto, you can ignore this message. ***

I have been working on benchmarking osm.xml, osm-carto and some potential
optimizations, and I now have some results. They don't involve osm-carto
yet, but are a xfs vs ext4 baseline

I first tried my benchmarking on errol, but got wildly inconsistent numbers.
It did allow me to refine my methods.

I took a list of approximately 18k meta-tiles rendered by yevaud during 1h
of peak load and split it into lists of 1k x2 and 9k x2.

I wrote a script that would take two sets of tiles and alternate between
them 6 times, feeding them to render_list with the first two times to warm
up the cache to a consistent state. The reason for using two sets was to
attempt to have a workload that would get some other data into the cache. It
reported the last line for render time. See
https://github.com/pnorman/renderd-benchmark for scripts.

I did my tests on an Amazon EC2 m2.2xlarge instance which has 4 cores, 34.2
GB ram and 850 GB of on-instance storage which I used for the DB. The
on-instance storage peaked at 1.6k iops read during my tests. Tiles were
saved to an EBS.

Rates are given as meta-tiles/second, with +/- values being one standard
deviation. Multiply by 8*8 to get tiles/second.

My first tests were with osm.xml for ext4 vs xfs. The ext4 database was
created by copying the postgresql data dir from an xfs vol to an ext4 vol. I
got results of 5-7% faster on ext4 on the 9k set, but there were other
factors like the presence of munin and the method used to create the pg data
dir that may make this result misleading. Additionally, updates and import
time should be considered.

All results below are for ext4 unless otherwise noted.

Render rates for the 1k benchmark set were 3.734 +/- 0.014 but only used
24GB ram. For reference, the cache-flushing rendering was at 2.640 +/- 0.06,
indicating that the tiles chosen has a significant effect.

The very first 1k tiles to warm up the cache (xfs) were 0.58, 22% of the
other runs of that set. Clearly the ram cache is very significant. IOPS
peaked during this set at 1.6k iops read, 100% util. CPU usage was ~100% on
the first run but went to 400% (all 4 cores) on the latter runs.

I then proceeded to run the scripts with the 9k sets, getting 2.280 +/-
0.017 on the benchmark set and 2.026 +/- 0.017 on the cache-flushing set.
(xfs) CPU usage varied from 200% to 280% with 60%-80% iowait on top of that.
Disk utilization was 60%-90%, 70% avg.

To see how effective the cache-flushing set is I ran the benchmark set twice
in a row and got 3.12 on the second run (xfs), indicating that 9k meta tiles
is enough to get some of the data out of cache.

Conclusions:

- Cache matters. It's likely that I needed less cache as my DB did not have
slim tables or updates going on.

- ext4 vs xfs for your database requires more research into updates, but I
will be using ext4 for the rest of these benchmarks. 

- For database disk performance to show up in your benchmarks on a server
with 32GB of RAM you need a really big benchmark set of tiles that covers
the entire world.