on
HTTP/WebDAV Analytics
Mike calls Analytics the killer app of the 7000 series NAS appliances. Indeed, this feature enables administrators to quickly understand what’s happening on their systems in unprecedented depth. Most of the interesting Analytics data comes from DTrace providers built into Solaris. For example, the iSCSI data are gathered by the existing iSCSI provider, which allows users to drill down on iSCSI operations by client. We’ve got analogous providers for NFS and CIFS, too, which incorporate the richer information we have for those file-level protocols (including file name, user name, etc.).
We created a corresponding provider for HTTP in the form of a pluggable Apache module called mod_dtrace. mod_dtrace hooks into the beginning and end of each request and gathers typical log information, including local and remote IP addresses, the HTTP request method, URI, user, user agent, bytes read and written, and the HTTP response code. Since we have two probes, we also have latency information for each request. We could, of course, collect other data as long as it’s readily available when we fire the probes.
The upshot of all this is that you can observe HTTP traffic in our Analytics screen, and drill down in all the ways you might hope (click image for larger size):
Caveat user
One thing to keep in mind when analyzing HTTP data is that we’re tracking individual requests, not lower level I/O operations. With NFS, for example, each operation might be a read of some part of the file. If you read a whole file, you’ll see a bunch of operations, each one reading a chunk of the file. With HTTP, there’s just one request, so you’ll only see a data point when that request starts or finishes, no matter how big the file is. If one client is downloading a 2GB file, you won’t see it until they’re done (and the latency might be very high, but that’s not necessarily indicative of poor performance).
This is a result of the way the protocol works (or, more precisely, the way it’s used). While NFS is defined in terms of small filesystem operations, HTTP is defined in terms of requests, which may be arbitrarily large (depending on the limits of the hardware). One could imagine a world in which an HTTP client that’s implementing a filesystem (like the Windows mini-redirector) makes smaller requests using HTTP Range headers. This would look more like the NFS case - there would be requests for ranges of files corresponding to the sections of files that were being read. (This could have serious consequences for performance, of course.) But as things are now, users must understand the nature of protocol-level instrumentation when drawing conclusions based on HTTP Analytics graphs.
Implementation
For the morbidly curious, mod_dtrace is actually a fairly straightforward USDT provider, consisting of the following components:
-
http.d defines http_reqinfo_t, the stable structure used as an argument to probes (in D scripts). This file also defines translators to map between httpproto_t, the structure passed to the DTrace probe macro (by the actual code that fires probes in mod_dtrace.c), and the pseudo-standard conninfo_t and aforementioned http_reqinfo_t. This file is analogous to any of the files shipped in /usr/lib/dtrace on a stock OpenSolaris system.
-
http_provider_impl.h defines httpproto_t, the structure that mod_dtrace passes into the probes. This structure contains enough information for the aforementioned translators to fill in both the conninfo_t and http_reqinfo_t.
-
http_provider.d defines the provider’s probes:
provider http { probe request__start(httpproto_t *p) : (conninfo_t *p, http_reqinfo_t *p); probe request__done(httpproto_t *p) : (conninfo_t *p, http_reqinfo_t *p); };
-
mod_dtrace.c implements the provider itself. We hook into Apache’s existing post_read_request and log_transaction hooks to fire the probes (if they are enabled). The only tricky bit here is counting bytes, since Apache doesn’t normally keep that information around. We use an input filter to count bytes read, and we override mod_logio’s optional function to count bytes written. This is basically the same approach that mod_logio uses, though is admittedly pretty nasty.
We hope this will shed some light on performance problems in actual customer environments. If you’re interested in using HTTP/WebDAV on the NAS appliance, check out my recent post on our support for system users.