The Game Producer's Quest Log

Bug Triage 101

ac_bug

Animal Crossing: New Horizons (2020)

A while ago, I had an interview for a producer role where they asked me a question along these lines:

You currently are 4 weeks away from having to send a final build of the game, and have roughly around 100 bugs in your issue tracking tool. Based on previous experience, you know the team can close roughly 10 to 15 issues a week. How do you handle getting through this bug list?

It was a great question! Most of the hard calls you will have to make on a project will be related to having more work than time available, and how you handle prioritization under constraints like this reveals a lot about how you think as a producer.

This particular question is a classic example of bug triage, the act of assigning categories and priorities to a bug database. It’s a collaborative process between teams like QA, engineering and production, but as a producer you should know what kind of questions to ask when approaching this process.

The first question you need to answer is understanding where a bug falls in terms of severity and impact. “Severity” is used to describe how harmful the bug is to a normal playthrough of the game: a crash or a progress blocker are much more severe bugs than a visual issue or an out-of-place voice line. Impact, on the other hand, refers to how often this glitch will actually affect the average player’s experience: a bug that only happens on certain graphics card and CPU combination will have less overall impact than a bug that affects all players doing a certain main quest in the story.

Understanding severity and impact allows you to map an initial priority matrix in which to categorize bugs:

As you can guess, this gives a great starting point in defining priorities for our triage, but of course, there are still more factors we should consider as we build out our plan. These will be especially important in helping us decide in which order to address bugs that are of the same priority level.

In particular, as someone with an engineering background, I like to think of the complexity of fixing an issue. While the initial scenario gave us the assumption that the team could fix 10 to 15 bugs a week, I’d be wary of taking that assumption at face value without consulting with the team responsible for fixing each issue. A high severity and high impact crash could be something as complex as a memory leak or optimization issue that could require multiple days of debugging, or it could be something as simple as an incorrectly named file that can be fixed in less than an hour.

I also think about complexity in how fixes can have potential side-effects. It may seem like a good idea to fix a low severity but high impact issue with how collisions are detected in your game’s physics system, but this fix could potentially break another part of the game that relied on this particular quirk of the physics system.

It’s also worth asking if a fix can be simplified. One of my favorite examples of this is a story a friend shared about a glitch where players could consistently get stuck if they jumped repeatedly while in a door frame. While the “correct” way to fix this would have been to find the issue in the physics engine, the team opted to simply block the player from jumping whenever they were inside a door frame.

Once you have triaged all (or at least, most of) your issues, you can get an overall feel of the “health” of the game. It’s a very different position to be in when the majority of your 100 bugs are high priority with high complexity to fix versus the majority of those 100 bugs being low priority issues. Even if you have many high priority bugs, the situation may not be as dire as it seems if these bugs are relatively low complexity to fix (and I’ve been on a couple of projects that seemed to “magically come together at the end of development” because of this).

To return to the original question, all of this is a great start to figure out how you’ll address your bug fixing plan for these four weeks before sending a build… but you should also be thinking about what happens after! What can be fixed in a patch? How soon can you release this patch? How often can you patch the game?

Of course, if you’re dealing with a game that is releasing, you will have to be ready to adapt to whatever players report on release. For example, there were a couple of medium priority bugs in Bubblegum Galaxy we did not address before release as they were related to the endgame content… but within 24 hours of release, we already had multiple players that were complaining about these issues, and decided to fast track a patch that fixed all these issues.

A whole topic I glossed over in this discussion is working with your QA team during the whole process. Besides following up to confirm if bugs were properly fixed and can be closed, you also have to talk with them about the new bugs that will inevitably pop up during this period. A good QA team will know the ins-and-outs of your game, and their opinions on what constitutes high or low priority bugs are usually spot on. This topic deserves its own blog post, but as a starting point, I recommend checking out this great chat with members of the QA team at CD PROJEKT RED and this GDC talk about QA on the original Horizon game.

At the end of the day, I don’t think it’s possible to truly release a game without bugs, but learning how to make deliberate and informed decisions about what to prioritize while understanding the tradeoffs is in general a key skill for any producer! And I hope this framework helps you do just that.