illumos tools for observing processes

illumos, with Solaris before it, has a history of delivering rich tools for understanding the system, but discovering these tools can be difficult for new users. Sometimes, tools just have different names than people are used to. In many cases, users don’t even know such tools might exist.

In this post I’ll describe some tools I find most useful, both as a developer and an administrator. This is not intended to be a comprehensive reference, but more like part of an orientation for users new to illumos (and SmartOS in particular) but already familiar with other Unix systems. This post will likely be review for veteran illumos and Solaris users.

The proc tools (ptools)

The ptools are a family of tools that observe processes running on the system. The most useful of these are pgrep, pstack, pfiles, and ptree.

pgrep searches for processes, returning a list of process ids. Here are some common example invocations:

$ pgrep mysql         # print all processes with "mysql" in the name
                      # (e.g., "mysql" and "mysqld")
$ pgrep -x mysql      # print all processes whose name is exactly "mysql"
                      # (i.e., not "mysqld")
$ pgrep -ox mysql     # print the oldest mysql process
$ pgrep -nx mysql     # print the newest mysql process
$ pgrep -f mysql      # print processes matching "mysql" anywhere in the name
                      # or arguments (e.g., "vim mysql.conf")
$ pgrep -u dap        # print all of user dap's processes

These options let you match processes very precisely and allow scripts to be much more robust than “ps -A | grep foo” allows.

I often combine pgrep with ps. For example, to see the memory usage of all of my node processes, I use:

$ ps -opid,rss,vsz,args -p "$(pgrep -x node)" 
  PID  RSS  VSZ COMMAND
 4914 94380 98036 /usr/local/bin/node demo.js -p 8080
32113 92616 95964 /usr/local/bin/node demo.js -p 80

pkill is just like pgrep, but sends a signal to the matching processes.

pstack shows you thread stack traces for the processes you give it:

$ pstack 51862
51862:      find /
 fedd6955 getdents64 (fecb0200, 808ef87, 804728c, fedabd84, 808ef88, 804728c) + 15
 0805ee9c xsavedir (808ef87, 0, 8089a90, 1000000, 0, fee30000) + 7c
 080582dc process_path (808e818, 0, 8089a90, 1000000, 0, fee30000) + 33c
 080583ee process_path (808e410, 0, 8089a90, 1000000, 0, fee30000) + 44e
 080583ee process_path (808e008, 0, 8089a90, 0, 0, fecb2a40) + 44e
 080583ee process_path (8047cbd, 0, 8089a90, 0, fef40c20, fedc78b6) + 44e
 080583ee process_path (8075cd0, 0, 2f, fed59274, 8047b48, 8047cbd) + 44e
 08058931 do_process_top_dir (8047cbd, 8047cbd, 0, 0, 0, 0) + 21
 08057c5e at_top   (8058910, 2f, 8047bb0, 8089a90, 28, 80571f0) + 9e
 08072eda main     (2, 8047bcc, 8047bd8, 80729d0, 0, 0) + 4ea
 08057093 _start   (2, 8047cb8, 8047cbd, 0, 8047cbf, 8047cd3) + 83

This is incredibly useful as a first step for figuring out what a program is doing when it’s slow or not responsive.

pfiles shows you what file descriptors a process has open, similar to “lsof” on Linux systems, but for a specific process:

$ pfiles 32113
32113:      /usr/local/bin/node /home/snpp/current/js/snpp.js -l 80 -d
  Current rlimit: 1024 file descriptors
   0: S_IFCHR mode:0666 dev:527,6 ino:2848424755 uid:0 gid:3 rdev:38,2
      O_RDONLY|O_LARGEFILE
      /dev/null
      offset:0
   1: S_IFREG mode:0644 dev:90,65565 ino:38817 uid:0 gid:0 size:793928
      O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE
      /var/svc/log/application-snpp:default.log
      offset:793928
   2: S_IFREG mode:0644 dev:90,65565 ino:38817 uid:0 gid:0 size:793928
      O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE
      /var/svc/log/application-snpp:default.log
      offset:793928
   3: S_IFPORT mode:0000 dev:537,0 uid:0 gid:0 size:0
   4: S_IFIFO mode:0000 dev:524,0 ino:6257976 uid:0 gid:0 size:0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
   5: S_IFIFO mode:0000 dev:524,0 ino:6257976 uid:0 gid:0 size:0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
   6: S_IFSOCK mode:0666 dev:534,0 ino:23280 uid:0 gid:0 size:0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
    SOCK_STREAM
    SO_REUSEADDR,SO_SNDBUF(49152),SO_RCVBUF(128000)
    sockname: AF_INET 0.0.0.0  port: 80
   7: S_IFREG mode:0644 dev:90,65565 ino:91494 uid:0 gid:0 size:6999682
      O_RDONLY|O_LARGEFILE
      /home/snpp/data/0f0f2418d7967332caf0425cc5f31867.webm
      offset:2334720

This includes details on files (including offset, which is great for checking on programs that scan through large files) and sockets.

ptree shows you a process tree for the whole system or for a given process or user. This is great for programs that use lots of processes (like a build):

$ ptree $(pgrep -ox make)
4599  zsched
  6720  /usr/lib/ssh/sshd
    45902 /usr/lib/ssh/sshd
      45903 /usr/lib/ssh/sshd
        45906 -bash
          54464 make -j4 
            54528 make -C out BUILDTYPE=Release
              55718 cc -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -DL_ENDIAN -DOPENS
                55719 /opt/local/libexec/gcc/i386-pc-solaris2.11/4.6.2/cc1 -quiet -I
              55757 cc -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -DL_ENDIAN -DOPENS
                55758 /opt/local/libexec/gcc/i386-pc-solaris2.11/4.6.2/cc1 -quiet -I
              55769 sed -e s|^bf_null.o|/home/dap/node/out/Release/obj.target/openss
              55771 /bin/sh -c sed -e "s|^bf_nbio.o|/home/dap/node/out/Release/obj.t

Here’s a summary of these and several other useful ptools:

Some of these tools (including pfiles and pstack) will briefly pause the process to gather their data. For example, “pfiles” can take several seconds if there are many file descriptors open.

For details on these and a few others, check their man pages, most of which are in proc(1).

Core files

Many of the proc tools operate on core files just as well as live processes. Core files are created when a process exits abnormally, as via abort(3C) or a SIGSEGV. But you can also create one on-demand with gcore:

$ gcore 45906
gcore: core.45906 dumped

$ pstack core.45906
core 'core.45906' of 45906:     -bash
 fee647f5 waitid   (7, 0, 8047760, f)
 fee00045 waitpid  (ffffffff, 8047838, c, 108a7, 3, 8047850) + 65
 0808f4c3 waitchld (0, 0, 0, 0, 20000, 0) + 87
 0808ffc6 wait_for (108a7, 0, 813c128, 3e, 330000, 78) + 2ce
 08082ee8 execute_command_internal (813b348, 0, ffffffff, ffffffff, 813c128) + 1758
 08083d3d execute_command (813b348, 1, 8047b58, 8071a7d, 0, 0) + 45
 08071c18 reader_loop (fed90b2c, 80663dd, 8047c34, fed90dc8, 8069380, 0) + 240
 080708e3 main     (1, 8047dfc, 8047e04, 80eb9f0, 0, 0) + aff
 0806f32b _start   (1, 8047ea4, 0, 8047eaa, 8047eb3, 8047ebf) + 83

Lazy tracing of system calls

DTrace can trace system calls across the system with minimal impact, but for cases where the overhead is not important and you only care about one process, truss can be a convenient tool because it decodes arguments and return values for you:

$ truss -p 3135
sysconfig(_CONFIG_PAGESIZE)                     = 4096
ioctl(1, TCGETA, 0x080479F0)                    = 0
ioctl(1, TIOCGWINSZ, 0x08047B88)                = 0
brk(0x08086CA8)                                 = 0
brk(0x0808ACA8)                                 = 0
open(".", O_RDONLY|O_NDELAY|O_LARGEFILE)        = 3    
fcntl(3, F_SETFD, 0x00000001)                   = 0
fstat64(3, 0x08047940)                          = 0
getdents64(3, 0xFEC84000, 8192)                 = 720
getdents64(3, 0xFEC84000, 8192)                 = 0

When debugging path-related issues (like why Node.js can’t find the module you’re requiring), it’s often useful to trace just calls to “open” and “stat” with “truss -topen,stat”. This is also good for watching commands that traverse a directory tree, like “tar” or “find”.

DTrace and MDB

I mention DTrace and MDB last, but they’re the most comprehensive, most powerful tools in the system for understanding program behavior. The tools described above are simpler and present the most commonly useful information (e.g., process arguments or open file descriptors), but when you need to get arbitrary information about the system, these two are the tools to use.

DTrace is a comprehensive tracing framework for both the kernel and userland apps. It’s designed to be safe by design, to have zero overhead when not enabled, and to minimize overhead when enabled. DTrace has hundreds of thousands of probes at the kernel level, including system calls (system-wide), the scheduler, the I/O subsystem, ZFS, process execution, signals, and most function entry/exit points in the kernel. In userland, DTrace instruments function entry and exit points, individual instructions, and arbitrary probes added by application developers. At each of these instrumentation points, you can gather information like the currently running process, a kernel or userland stack backtrace, function arguments, or anything else in memory. To get started, I’d recommend Adam Leventhal’s DTrace boot camp slides. (The context and instructions for setup are a little dated, but the bulk of the content is still accurate.)

MDB is the modular debugger. Like GDB on other platforms, it’s most useful for deep inspection of a snapshot of program state. That can be a userland program or the kernel itself, and in both cases you can open a core dump (crash dump, for the kernel) or attach to the running program (kernel). As you’d expect, MDB lets you examine the stack, global variables, threads, and so on. The syntax is a little arcane, but the model is Unixy, allowing debugger commands to be strung together much like a shell pipeline. Eric Schrock has two excellent posts for people moving from GDB to MDB.

Let me know if I’ve missed any of the big ones. I’ll be writing a few more posts on tools in other areas of the system.