Node.js/V8 postmortem debugging

2016 Update: The commands and output in this post are pretty dated now, but you can find an up-to-date docs (including a tutorial) in the mdb_v8 user guide.

Photo by Auntie P (flickr)

I recently wrote an article for ACM Queue discussing postmortem debugging for dynamic environments. I argued that while native environments for years have provided rich tools for understanding software failures postmortem, most popular dynamic environments have not developed similar facilities, and this gap is becoming more important as these environments grow popular for building complex core components of large distributed systems.

In building Cloud Analytics, we knew we needed a robust facility for postmortem debugging. The CA service comprises several components: a few global services that manage state, an agent on each server in the datacenter that collects performance data, and a cluster of data aggregators. Debugging distributed systems is always hard, and it’s even harder when the software is running on other people’s production systems.

We built a module called node-panic that dumps predefined program state in the event of a crash. It also has a tool for manually triggering a crash in the event of a hard-hang (e.g., infinite loop). These tools have been extremely valuable in helping us root-cause issues in both production and development environments. But they’re really just a first step, and as I explain in the article, this approach has several important limitations, mostly owing to the fact that node-panic is pure JavaScript and so isn’t baked into the VM itself and doesn’t have access to a lot of important program state.

A next step

More recently I’ve been playing around with building an MDB module for inspecting V8 state. (Recall that V8 is the blazingly fast JavaScript virtual machine in both Node.js and Chrome, and MDB is the illumos modular debugger, an extensible tool for analyzing both live programs and process core dumps.) With this approach, you can attach MDB to either a running Node program, a core file manually created using gcore(1), or a core file generated as a result of a Node crash (e.g., segfault).  Importantly, you don’t need a debug build of node for this so you can use it in production.

Here’s a very simple example program:

dap@devel ~ $ cat -n node-stack/examples/loop-anon.js
     1	/*
     2	 * This script demonstrates a simple stack trace involving an anonymous
     3	 * function, then goes into an infinite loop so you can catch it with the
     4	 * debugger.  One caveat: we grab a stack trace at the start to force V8 to
     5	 * compute line number information for this script.  If we didn't have this, the
     6	 * debugger would only print out position-in-file information.
     7	 */
     8	new Error().stack;
     9
    10	function main() { func1(); }
    11	function func1() { func2(); }
    12
    13	function func2()
    14	{
    15		(function () {
    16			for (;;)
    17				;
    18		})();
    19	}
    20
    21	main();

Note the comment at the top of the script. V8 computes line number information for a given script lazily. If it has never done this for a particular script, the debugger module won’t be able to print out the line number.  This example grabs a stacktrace at the start of the script to make sure V8 has computed line numbers for it.  (There’s an example of the other case below.)

Now I’ll run this script with a stock node build and attach MDB:

dap@devel ~ $ node node-stack/examples/loop-anon.js &
[1] 400524

dap@devel ~ $ mdb -p 400524
Loading modules: [ ld.so.1 libc.so.1 ]
> ::status
debugging PID 400524 (32-bit)
file: /usr/bin/node
threading model: native threads
status: stopped by debugger

Now I’ll grab a JavaScript stack trace:

> ::load mdb_v8.so
> ::jsstack
804738c <anonymous> (as <anon>) at /home/dap/node-stack/examples/loop-anon.js line 15
80473a4 func2 at /home/dap/node-stack/examples/loop-anon.js line 13
80473bc func1 at /home/dap/node-stack/examples/loop-anon.js line 1180473d4 main at /home/dap/node-stack/examples/loop-anon.js line 10
80473f0 <anonymous> (as <anon>) at /home/dap/node-stack/examples/loop-anon.js line 1
8047424 <InternalFrame>
804745c <anonymous> (as Module._compile) at module.js line 340
804747c <anonymous> (as Module._extensions) at module.js line 406
80474a0 <anonymous> (as Module.load) at module.js line 324
80474cc <anonymous> (as Module._load) at module.js line 257
80474ec <anonymous> (as Module.runMain) at module.js line 419
8047518 <anonymous> (as startup.processNextTick.process._tickCallback) at node.js line 120
8047534 <InternalFrame>
8047568 <EntryFrame>
fe84e240 <None>

If V8 hadn’t computed the line number information for our script, the stack trace would look more like this:

> ::jsstack
804738c <anonymous> (as <anon>) at /home/dap/node-stack/examples/loop-anon-noinfo.js
position 814
80473a4 func2 at /home/dap/node-stack/examples/loop-anon-noinfo.js position 772
80473bc func1 at /home/dap/node-stack/examples/loop-anon-noinfo.js position 710
80473d4 main at /home/dap/node-stack/examples/loop-anon-noinfo.js position 652
80473f0 <anonymous> (as <anon>) at /home/dap/node-stack/examples/loop-anon-noinfo.js
position 2
8047424 <InternalFrame>
804745c <anonymous> (as Module._compile) at module.js position 19476
804747c <anonymous> (as Module._extensions) at module.js position 23260
80474a0 <anonymous> (as Module.load) at module.js position 18526
80474cc <anonymous> (as Module._load) at module.js position 15200
80474ec <anonymous> (as Module.runMain) at module.js position 23916
8047518 <anonymous> (as startup.processNextTick.process._tickCallback) at node.js
position 7224
8047534 <InternalFrame>
8047568 <EntryFrame>
fe84e240 <None>

Besides getting a stack trace, you can examine individual frames in more detail using the frame pointer:

> 80473a4::jsframe -v
frame pointer: 80473a4
    function: fd862b8d (JSFunction)
        shared function info: fc615229 (SharedFunctionInfo)
            function name: fc6941a9 (ascii SeqString: "func2")
            function token position: SMI: value = 515
            script: fc614f4d (Script)
                script name: fd85ec05 (ascii ConsString: "
                /home/dap/node-stack/examples/loop-anon.js")
                line endings: fd863605 (FixedArray)
            function token at: line 13

and you can inspect individual heap objects as well. Given a pointer to any JavaScript heap object, “::jsprint” will figure out what type it is and dump a representation of the corresponding C++ class inside the V8 VM:

> fd862b8d::jstype
0xfd862b8d: JSFunction

> fd862b8d::jsprint
fd862b8d JSFunction {
    fd862b8d JSObject {
        fd862b8d HeapObject < Object < MaybeObject  {
            fd862b8c map = fe82df41 (Map)
        }
        fd862b90 properties = fe86e021 (FixedArray)
        fd862b94 elements = fe86e021 (FixedArray)
    }
    fd862b9c prototype_or_initial_map = fe86e0cd (Oddball)
    fd862ba0 shared = fc615229 (SharedFunctionInfo)
    fd862ba8 literals = fe86e021 (FixedArray)
    fd862bac next_function_link = fe86e05d (Oddball "undefined")
}

This is still very much a prototype, but we think this could be very powerful. In the meantime, there are some issues to be solved with the existing implementation and several more problems to address.

Maintainability

The heap inspector requires pretty deep knowledge of the internal memory layout of the V8 heap. How can this be maintained in a reasonable way as V8 itself evolves?

For C programs shipped with illumos, we use CTF at build-time to encode type definitions in the binary itself. CTF is compact enough to be shipped in production binaries, so MDB understands all in-memory structures for these programs on all illumos systems. Application-specific debugger commands can be implemented in terms of CTF so that they don’t have to be built specifically for a particular application binary.

This approach is tougher for C++ programs, which have additional information like class hierarchies that can’t be expressed with CTF. But while V8 uses C++ classes (and an inheritance hierarchy) to represent different heap object types, it doesn’t use standard C++ member fields. Instead, each class defines static constants describing the offsets of its member fields, and macros are used to define accessors in terms of these constants. This actually makes it easier to build and maintain a C debugger module because it’s much easier to export these constants as part of the build process than to reverse-engineer how a C++ structure gets laid out in memory. If this proves to be the right approach, then we could save these constants into the binary using CTF and then build the debugger module independent of the V8 binary.

Completeness

Today, this module can dump a simple stack trace and provide basic inspection of heap objects. But there’s a lot more that would be useful:

I’ve also only tested it on pretty simple examples so far. My examples have included JavaScript frames for named and anonymous functions, constructor frames, argument adaptor frames, and simple entry frames. I haven’t yet tried optimized frames, exit frames, or an exit/entry sequence.

This module has already helped us understand enough to write a basic DTrace ustack helper for Node programs, which allows similar stack traces to be gathered via DTrace while the program is running. In future blog posts I’ll talk more about the DTrace helper as well as what I’ve learned about V8, which I hope will be useful for others hacking around with it.