Kartlytics: Applying Big Data Analytics to Mario Kart 64

This post also appears on the Joyeur blog.

If you missed it, Joyent recently launched Manta, a web-facing object store with compute as a first-class operation. Manta makes it easy to crunch on Big Data in the cloud, and we’ve seen it used by both ourselves and others and others to solve real business problems involving Big Data. But it’s not just for user behavior and crash dump analysis: Manta has profoundly changed the way we operate here at Joyent’s SF office.

Mario Kart 64

On a typical Friday afternoon at Joyent’s San Francisco office, it’s been a long week and the engineers are getting restless. I glance over at Bill, who mimes working a video game controller with both hands and nods towards the projector. The war dance has begun, and the trash talking will soon follow. Factions form, and soon everyone’s agreed on only one thing: it’s time to play Kart.

Playing Kart

(photo: Joshua Clulow)

Sound familiar? I know we’re not the only office that brings the same intensity to office video games that we bring to our work. Kart is the perfect game for serious office competition because it’s dead simple to learn but takes a long time to master. It’s a standard arcade-style racing game with cartoonish weapons, encouraging the best of friends to abuse and taunt each other mercilessly.

Kart clip

But it’s not just about the competition. In our years playing Kart, we’ve seen a lot of quirky game behavior, and we’ve always wondered: is it true that the game handicaps successful players by giving them less powerful weapons? Given that, how much does the first lap really matter? And which characters are more likely to win: heavyweights or lightweights?

We’ve also had lots of time to discuss strategy: how much does power sliding matter? How much time do you lose for each banana peel slip, shell hit, bomb impact, or fall off the track?

As serious intellectuals often do, we spent hours discussing these questions, what data we would want to collect to answer them, and even how we might go about collecting it. It sounded like a fun project, so I wrote a program that takes video captures of our Mario Kart 64 sessions and picks out when each race starts, which character is in each box on the screen, the rank of each player as the race progresses, and finally when the race finishes. Then I built a web client that lets us upload videos, record who played which character in each race, and browse the aggregated stats. The result is called Kartlytics, and now contains videos of over 230 races from over the last year and change.

This worked pretty well, but as we collected more videos, the time to process them grew prohibitive. Often I’d want to tweak the program to fix detection of, say, Yoshi Valley, but I couldn’t be sure that the change wouldn’t cause it to do the wrong thing on some other track. To test that, I’d have to rerun the process on all the videos, but that took nearly a full day on my laptop.

Enter Manta

This changed significantly about a month ago when Joyent launched Manta. We like to say that we built Manta to solve all kinds of problems from log analysis to video transcoding, but the reality is now clear: Manta exists to compute analytics on Mario Kart 64 sessions. Immediately after we stood up our production Manta service, I loaded all of the videos we’ve recorded to date. If you’ve set up the toolkit, you can list these videos with:

$ mls -l /dap/public/kartlytics/videos | grep -v json
-rwxr-xr-x 1 dap      71246499 Jun 3 18:35 2012-06-19-00.mov
-rwxr-xr-x 1 dap      66418215 Jun 3 17:25 2012-06-19-02.mov
-rwxr-xr-x 1 dap      67641347 Jun 3 18:37 2012-06-19-03.mov
...

In recent weeks, I’ve converted kartlytics to use Manta for all of the video processing. Here’s the job to process each video, producing a JSON file for each one describing what happened in all races in the video:

$ export KARTLYTICS_OUTDIR=/dap/public/kartlytics/generated
$ mfind -t o -n '.*.mov$' /dap/public/kartlytics/videos |
    mjob create -w
    -s /dap/public/kartlytics/kartlytics.tgz
    -s /dap/public/kartlytics/bin/video-transcribe
    --init "cd /var/tmp && tar xzf /assets/dap/public/kartlytics/kartlytics.tgz"
    -m "/assets/dap/public/kartlytics/bin/video-transcribe /var/tmp/kartlytics $KARTLYTICS_OUTDIR "'$MANTA_INPUT_FILE'

This job uses a common pattern for Manta jobs, which is to throw an existing program (kartlytics) into a tarball in Manta, specify it as an asset for a job, and run the bundled program inside the job. In this case, the kartlytics.tgz tarball contains a built copy of the kartlytics github repo, which contains the C program and support files for doing the video processing. There’s a separate job for creating this tarball, which is just:

$ export KARTLYTICS_TARBALL=/dap/public/kartlytics/kartlytics.tgz
$ mjob create -w
    -s /dap/public/kartlytics/bin/make-tarball
    -r "/assets/dap/public/kartlytics/bin/make-tarball $KARTLYTICS_TARBALL" < /dev/null

The “make-tarball” script just clones the repo, builds it, and saves the tarball back to Manta.

Processing all these videos used to take nearly a day on my laptop. It takes just 5 minutes on Manta, and I don’t have to think about spinning up compute instances and then spinning them down when it’s done. This makes a serious difference because I can get feedback on algorithm tweaks in minutes rather than overnight.

There are two other Manta jobs involved: one that takes the race transcripts produced by the first job and aggregates them with the human-entered data about who played which characters in each race, producing a single aggregated JSON file; and one that takes the same race transcripts and produces smaller web-quality videos for each race that you can play directly on the kartlytics.com website. For details, see the the run_all script in the repo.

Try it: Since these videos are public, you can run kartlytics yourself on our collection of videos. You can run these two jobs exactly as written, replacing KARTLYTICS_OUTDIR and KARTLYTICS_TARBALL with paths in your own storage areas (e.g., /$MANTA_USER/public/kartlytics). The above “run_all” script is also parametrized so that anyone can run it on our videos.

Highlights

The front page of kartlytics.com shows some summary information, including the tracks we play a lot:

Summary

We start every session with Luigi Raceway, which is why that track has the most plays. After the first race, the winner chooses the next track, so the distribution diverges after that.

Next on the page we show races from the latest session, then the “wildest races”. This is a proxy for “slugfests”, which are races with just the right circumstances (wide open track, a tight pack, lucky weapons, etc.) for a weapons-based free-for-all. The “wildest” ones listed here are those with the most number of player position changes per minute.

Wildest races

Below that, we have an example of a hotly-debated question: how often does a player go from 1st to 4th in less than 5 seconds? We call that a “Keithing”, and the front page shows all of them. You can sort by person to see how many times it’s happened to each player (but keep in mind that not all players have played the same number of races).

You can click on the date/time of any race to get to the race details page, which summarizes the results, shows a playable video of just that race, and then the all important race transcript, showing not only everything that happened in the race, but with screenshots to prove it:

Event summary

That’s really important, both for debugging and to silence criticism about kartlytics “making stuff up”.

You can also click any player to see that player’s races, how many times they’ve played each track, and which characters (and character classes – lightweight, middleweight, or heavyweight) they tend to play:

Player details

Future

We’ve really just scratched the surface. We’ve only answered a few of the questions I mentioned above. We’ve discussed tracking lots more data, like power slides (to see how much they matter), weapons (to see how the distribution changes with rank), and various mistakes (to prove [or disprove] claims like “I keep running into banana peels today”).

We’d also love to add videos from you! It’s surprisingly easy. To record video, I purchased an iGrabber and a small power amplifier (so we could split the video signal without losing too much quality – you may not need this if you’re playing on a TV instead of a projector). The iGrabber comes with software to record the video, and kartlytics does the rest. As I showed above, you can run the software on your own jobs.

The algorithm behind Kartlytics is really pretty simplistic (see the README in the repo for details), but it can probably be applied to other cult classics that have relatively simple on-screen game status. I know our Seattle office is big on Street Fighter II; surely Street Fighter-lytics can’t be far away?