Surge 2011

We had a great time last week attending Surge in Baltimore. Highlights for me included Baron Schwartz’s visualizations of MySQL execution time (not entirely unlike Brendan’s, but with the addition of modeling), Geir Magnusson’s discussion of scalability at Gilt, and Raymond Blum’s discussion of backup/restore at Google (including recovery after the GMail outage from several months ago).

In his keynote, Ben Fried introduced what turned out to be an important theme in several talks: the importance of hiring generalists. Generalists are people who dive into problems and chase them through all levels of the stack. Ben’s point resonated strongly with me, as it was a very important principle at Fishworks and continues to be important here at Joyent. Bryan posed an interesting follow-up question: are generalists born or made? In my (limited) experience, being a generalist is more an attitude and set of traits (like tenacity and persistence) than any particular set of skills, and while most people either have that attitude or they don’t, those that do can often go either way depending on the environment.

I remember the turning point for me early in my internship with the Solaris kernel group. I was editing source files on the build machine over NFS, typed “:w”, and my vim process hung. Being pretty new to Solaris and this dev environment, I bugged my mentor Dan Price about it. He introduced me to the kernel debugger, showed me how to find the kernel stack for the vim process’s main thread, and then had me pull up the NFS client source where the thread was hung. This seems obvious now, but at the time it hadn’t even occurred to me that I could figure out why my desktop was hung by just pulling up the kernel source and looking at what it was doing. That experience taught me not to see component boundaries as barriers to debugging. Since then, while I spend the vast majority of my time in familiar source bases, I’ve also found myself in code from Apache (to figure out why httpd was hung) to rabbitmq (to understand the problem discussed in our talk). (Not that I know Erlang, but I could make sense of enough to follow the rabbitmq source.) Besides helping to solve an immediate problem, I’ve found that diving into new codebases both broadens your experience and deepens your understanding of the whole system. This investment usually pays off huge.

So one important takeaway from Ben’s and several other talks was that it’s critical to build teams with generalists because understanding failures in complex systems by passing the problem around to different specialized groups is often just untenable.

On Thursday, Bryan and I spoke about the technical choices we made in designing Cloud Analytics and how those have turned out. We also shared some of our recent challenges in deploying Cloud Analytics in production. The slides are available, the talk was recorded, and the videos should be available early next year.

Thanks to OmniTI for putting on a great conference and to everyone who attended. I hope to see you all next year!