on
Tracing Node.js add-on latency
Node.js has some built-in DTrace probes for looking at HTTP request start/done and GC start/done, and I wrote nhttpsnoop to report latency based on these probes. There’s also a future project that will make it easy to trace any asynchronous operations. But in the meantime, there’s one category of latency that’s missing, which is latency of asynchronous operations kicked off by add-ons.
As an example, I’ll use node-sqlite3. Below, I have sqlite-demo.js, which is a version of the demo program that’s been modified to run in a loop where it opens the database, inserts some records, selects some records, and closes the database. As this program runs, the add-on kicks off async operations to open the database, run queries, and close the database. We’re going to trace those operations to see how long each type of operation takes.
Below that, I have trace-addon-latency.d, a D script that traces the libuv uv_queue_work
function as well as all functions in all add-ons in the target Node process. This is a little heavy-handed for what we’re trying to do, and can have a noticeable impact on program performance, depending on the add-on. For many of the small add-ons I use, the impact is negligible.
By running the program and the D script for a few minutes, then stopping the D script, you get output that looks like this:
node_sqlite3.node`node_sqlite3::Database::Work_BeginExec(node_sqlite3::Database::Baton*)
value ------------- Distribution ------------- count
2048 | 0
4096 |@@@@@@@@@@@@@@@@@ 6018
8192 |@@@@@ 1590
16384 |@ 302
32768 |@@@@@@@@@@@@@@@@@ 5722
65536 | 156
131072 | 0
There’s a histogram like this for each “kind” of asynchronous operation that each add-on kicks off. By “kind”, I’m referring to the C++ function that kicks off the operation. In this case, Work_BeginExec kicked off 6018 operations that took between 4us and 8us.
This is a quick way to get a look at what async operations your Node program is doing (from add-ons) and how long they’re taking. This works on OS X and SmartOS, and should work on BSD and any other system with DTrace.
Full steps to reproduce
$ mkdir demo
$ cd demo
$ npm install sqlite3
$ curl -O https://gist.github.com/davepacheco/104e29e6aa2a5a3c1dca/raw/bbcd859c7469ed20b7126c56f3a5c45dc43986e1/sqlite-demo.js
$ curl -O https://gist.github.com/davepacheco/104e29e6aa2a5a3c1dca/raw/45c7e0da1b5aeb0cced2163e525e7caffb1a6ebe/trace-addon-latency.d
$ node sqlite-demo.js &
$ dtrace -s trace-addon-latency.d -p $! # you may need "sudo" here, particularly on OS X
Let that run for a little while, then kill either one of the “node” or “dtrace” processes. (Killing the “node” one causes “dtrace” to exit, since it was attached to it.) That will cause “dtrace” to exit and print the results.
sqlite-demo.js
var sqlite3 = require('sqlite3').verbose();
go();
function go()
{
var db = new sqlite3.Database(':memory:');
db.serialize(function () {
db.run("CREATE TABLE lorem (info TEXT)");
var stmt = db.prepare("INSERT INTO lorem VALUES (?)");
for (var i = 0; i < 10; i++) {
stmt.run("Ipsum " + i);
}
stmt.finalize();
db.exec("SELECT rowid AS id, info FROM lorem", function () {
db.close();
go();
});
});
}
trace-addon-latency.d
#!/usr/sbin/dtrace -s
/*
* trace-addon-latency.d: Run this script as:
*
* dtrace -s trace-addon-latency.d -p PID
*
* to trace the latency of asynchronous operations kicked off by all add-ons in
* the node process PID. NOTE: this will instrument all native functions in all
* add-ons, which can have a noticeable performance impact, depending on the
* add-on.
*/
pid$target::uv_queue_work:entry
{
callers[arg1] = ucaller;
donefunc[arg1] = arg3;
starttime[arg1] = timestamp;
}
pid$target:*.node::entry
/donefunc[arg0]/
{
@[ufunc(callers[arg0])] = quantize(timestamp - starttime[arg0]);
callers[arg0] = 0;
starttime[arg0] = 0;
donefunc[arg0] = 0;
}
trace-addon-latency.out
node_sqlite3.node`node_sqlite3::Database::Work_BeginClose(node_sqlite3::Database::Baton*)
value ------------- Distribution ------------- count
2048 | 0
4096 | 167
8192 |@@@@@@@@@@@@@@@@@@@@@@@@@ 8529
16384 |@@@@@@@ 2326
32768 |@@@@@@@@ 2740
65536 | 26
131072 | 0
node_sqlite3.node`node_sqlite3::Database::Work_BeginExec(node_sqlite3::Database::Baton*)
value ------------- Distribution ------------- count
2048 | 0
4096 |@@@@@@@@@@@@@@@@@ 6018
8192 |@@@@@ 1590
16384 |@ 302
32768 |@@@@@@@@@@@@@@@@@ 5722
65536 | 156
131072 | 0
node_sqlite3.node`node_sqlite3::Database::Work_BeginOpen(node_sqlite3::Database::Baton*)
value ------------- Distribution ------------- count
2048 | 0
4096 |@ 286
8192 |@@@@@@@@@@@@@@@@@@ 6064
16384 |@@ 642
32768 |@@@@@@@@@@@@@@@@@@@ 6701
65536 | 94
131072 | 1
262144 | 0
node_sqlite3.node`node_sqlite3::Statement::Work_BeginPrepare(node_sqlite3::Database::Baton*)
value ------------- Distribution ------------- count
2048 | 0
4096 |@@@@@ 3604
8192 |@@@@@@@@@@@@ 8376
16384 |@@ 1341
32768 |@@@@@@@@@@@@@@@@ 10831
65536 |@@@@@ 3423
131072 | 2
262144 | 0
node_sqlite3.node`node_sqlite3::Statement::Work_BeginRun(node_sqlite3::Statement::Baton*)
value ------------- Distribution ------------- count
2048 | 0
4096 |@@@@@@@@@@@@@@@@@@@@@@@ 87884
8192 |@@@@@ 19478
16384 |@@ 8692
32768 |@@@@@@@@ 32160
65536 |@ 3455
131072 | 0