Debugging RangeError from a core dump

Last week, I tweeted:

I had just run into this nasty Node.js error:

$ node foo.js 

timers.js:96
            if (!process.listeners('uncaughtException').length) throw e;
                                                                      ^
RangeError: Maximum call stack size exceeded

What went wrong? It was reasonably obvious from the error message that the program blew its stack, which I assumed was likely the result of some errant recursive function, which was surprising, because I didn’t know I was using any recursive functions. But given that the problem is too many function invocations on the stack, the obvious question is: what’s on the stack? But the RangeError exception doesn’t include a stack trace. Now what do I do?

I’d previously been playing around with the idea of having Node dump core via abort(3C) when an exception is thrown that is not caught. The idea is that with a core dump, we could use mdb_v8 to examine the stack at the time that the program crashed, including function arguments and any other heap objects we can find. This would be much richer than just the stack trace that most fatal failures leave behind.

I ran my program with this modified Node, it dumped core as expected, and I opened the core file. As expected, the stack was huge, but it was pretty clear what the pattern was:

> ::jsstack
...
80469c4 0x2bf66683 <anonymous> (as ReadStream._emitData) (3a84f56d)
80469dc 0x2bf66b7b <anonymous> (as ReadStream.resume) (3a84f5d9)
8046a14 0x2bf73766 emitNextRecord (3a889829)
8046a44 0x2bfaa133 gotRecord (3a8897f5)
8046a74 0x2bf74c18 handleLogLine (3a8898f9)
8046aa8 0x2bf7703f <anonymous> (as stream.on.leftover) (3a897151)
8046ae0 0x2bf2d2bc <anonymous> (as EventEmitter.emit) (3a83a579)
8046af8 0x2bf0db41 <ArgumentsAdaptorFrame>

8046b18 0x2bf66683 <anonymous> (as ReadStream._emitData) (3a84f56d)
8046b30 0x2bf66b7b <anonymous> (as ReadStream.resume) (3a84f5d9)
8046b68 0x2bf73766 emitNextRecord (3a889829)
8046b98 0x2bfaa133 gotRecord (3a8897f5)
8046920 0x2bf74c18 handleLogLine (3a8898f9)
8046bfc 0x2bf7703f <anonymous> (as stream.on.leftover) (3a897151)
8046c34 0x2bf2d2bc <anonymous> (as EventEmitter.emit) (3a83a579)
8046c4c 0x2bf0db41 <ArgumentsAdaptorFrame>

8046c6c 0x2bf66683 <anonymous> (as ReadStream._emitData) (3a84f56d)
...

Starting from the bottom, we see that Node is emitting a “data” event on a ReadStream object, which invokes my “leftover” callback, which calls a couple of my program’s internal functions, one of which calls resume() on the same read stream. Then we do it all over again.

We can inspect the arguments to see what data is being emitted. The program in question was the bunyan log reader, so the data turned out to be random log contents, but I was able to verify that multiple calls to my “data” callback were getting the exact same data. This took me to the resume() function in Node.js, where I found the root cause:

ReadStream.prototype.resume = function() {
  this.paused = false;

  if (this.buffer) {
    this._emitData(this.buffer);
    this.buffer = null;
  }

On resume, we emit any buffered data, and then remove it. But if emitting the data causes us to call resume() again (as it did here), then we emit the same data again and end up in this infinite loop until we run out of stack space and crash. I reported the issue on Friday, and koichik immediately fixed the bug. (Thanks!)

This turned out to be a very minor bug in Node core, but the consequences for my program were pretty serious: fatal failure, with almost no information left behind to debug it. If I hadn’t had this experimental V8 around, I probably would have resorted to commenting out half my code at a time, binary searching until I found the bad code. I could also have tried the debugger, but I’d still have to have some idea where to set breakpoints, which boils down to the same search problem. (Stepping through doesn’t work, since the problem is triggered by an asynchronous event.) This makes me wonder: how do people debug RangeErrors today?

While the V8 change to dump core on an uncaught exception is actually quite simple, I’ve been putting it off while I consider the best way to expose it without breaking other V8 users. After this experience, I’m thinking it’s worth trying to do sooner rather than later!