HTTP/WebDAV Analytics

Mike calls Analytics the killer app of the 7000 series NAS appliances. Indeed, this feature enables administrators to quickly understand what’s happening on their systems in unprecedented depth. Most of the interesting Analytics data comes from DTrace providers built into Solaris. For example, the iSCSI data are gathered by the existing iSCSI provider, which allows users to drill down on iSCSI operations by client. We’ve got analogous providers for NFS and CIFS, too, which incorporate the richer information we have for those file-level protocols (including file name, user name, etc.).

We created a corresponding provider for HTTP in the form of a pluggable Apache module called mod_dtrace. mod_dtrace hooks into the beginning and end of each request and gathers typical log information, including local and remote IP addresses, the HTTP request method, URI, user, user agent, bytes read and written, and the HTTP response code. Since we have two probes, we also have latency information for each request. We could, of course, collect other data as long as it’s readily available when we fire the probes.

The upshot of all this is that you can observe HTTP traffic in our Analytics screen, and drill down in all the ways you might hope (click image for larger size):

Caveat user

One thing to keep in mind when analyzing HTTP data is that we’re tracking individual requests, not lower level I/O operations. With NFS, for example, each operation might be a read of some part of the file. If you read a whole file, you’ll see a bunch of operations, each one reading a chunk of the file. With HTTP, there’s just one request, so you’ll only see a data point when that request starts or finishes, no matter how big the file is. If one client is downloading a 2GB file, you won’t see it until they’re done (and the latency might be very high, but that’s not necessarily indicative of poor performance).

This is a result of the way the protocol works (or, more precisely, the way it’s used). While NFS is defined in terms of small filesystem operations, HTTP is defined in terms of requests, which may be arbitrarily large (depending on the limits of the hardware). One could imagine a world in which an HTTP client that’s implementing a filesystem (like the Windows mini-redirector) makes smaller requests using HTTP Range headers. This would look more like the NFS case - there would be requests for ranges of files corresponding to the sections of files that were being read. (This could have serious consequences for performance, of course.) But as things are now, users must understand the nature of protocol-level instrumentation when drawing conclusions based on HTTP Analytics graphs.

Implementation

For the morbidly curious, mod_dtrace is actually a fairly straightforward USDT provider, consisting of the following components:

We hope this will shed some light on performance problems in actual customer environments. If you’re interested in using HTTP/WebDAV on the NAS appliance, check out my recent post on our support for system users.