Compression followup

My previous post discussed compression in the 7000 series. I presented some Analytics data showing the effects of compression on a simple workload, but I observed something unexpected: the system never used more than 50% CPU doing the workloads, even when the workload was CPU-bound. This caused the CPU-intensive runs to take a fair bit longer than expected.

This happened because ZFS uses at most 8 threads for processing writes through the ZIO pipeline. With a 16-core system, only half the cores could ever be used for compression - hence the 50% CPU usage we observed. When I asked the ZFS team about this, they suggested that nthreads = 3/4 the number of cores might be a more reasonable value, leaving some headroom available for miscellaneous processing. So I reran my experiment with 12 ZIO threads. Here are the results of the same workload (the details of which are described in my previous post):

Summary: text data set
Compression Ratio Total Write Read
off 1.00x 3:29 2:06 1:23
lzjb 1.47x 3:36 2:13 1:23
gzip-2 2.35x 5:16 3:54 1:22
gzip 2.52x 8:39 7:17 1:22
gzip-9 2.52x 9:13 7:49 1:24
Summary: media data set
Compression Ratio Total Write Read
off 1.00x 3:39 2:17 1:22
lzjb 1.00x 3:38 2:16 1:22
gzip-2 1.01x 5:46 4:24 1:22
gzip 1.01x 5:57 4:34 1:23
gzip-9 1.01x 6:06 4:43 1:23

We see that read times are unaffected by the change (not surprisingly), but write times for the CPU-intensive workloads (gzip) are improved over 20%:

From the Analytics, we can see that CPU utilization is now up to 75% (exactly what we’d expect):

CPU usage with 12 ZIO threads

Note that in order to run this experiment, I had to modify the system in a very unsupported (and unsupportable) way. Thus, the above results do not represent current performance of the 7410, but only suggest what’s possible with future software updates. For these kinds of ZFS tunables (as well as those in other components of Solaris, like the networking stack), we’ll continue to work with the Solaris teams to find optimal values, exposing configurables to the administrator through our web interface when necessary. Expect future software updates for the 7000 series to include tunable changes to improve performance.

Finally, it’s also important to realize that if you run into this limit, you’ve got 8 cores (or 12, in this case) running compression full-tilt and your workload is CPU-bound. Frankly, you’re using more CPU for compression than many enterprise storage servers even have today, and it may very well be the right tradeoff if your environment values disk space over absolute performance.

Update Mar 27, 2009: Updated charts to start at zero.