// JSON-LD for Wordpress Home, Articles and Author Pages. Written by Pete Wailes and Richard Baxter. // See: http://builtvisible.com/implementing-json-ld-wordpress/

Code Reviews Trump Unit Testing , But They Are Better Together

Last week I was participating in a formal code review (a.k.a. code inspection) with one of our clients.  We have been working with this client, helping them strengthen their development practices.  Holding formal code reviews is a key component for us.  Part of the formal process we introduced includes reviewing the unit testing results, both the (successful) output report and the code coverage metrics.

At one point we were reviewing some code that had several error handling blocks that were not being covered in the unit tests.  These blocks were, arguably, unlikely or impossible to reach (such as a Java StringReader throwing an IOException).  There was some discussion by the team about the necessity of mocking enough functionality to cover these blocks.

Although we agreed that some of the more esoteric error conditions weren’t worth the programmer’s time to mock-up, it occurred to me later that we were missing an important point.  What mattered was that we were holding a formal code review and looking at those blocks of code.

Let me take a step back.  In 1986, Capers Jones published a book entitled Programming Productivity.  Although dated, the book contains many excellent points that cause you think about how to create software in an efficient way.  Here efficiency is not about lines of code per unit of time, but more importantly, lines of correct code per unit of time.  This means taking into account rework due to errors and omissions.

One if the studies presented in the book relates to identifying defects in code.  It is a study whose results seem obvious when we think about them.  However, we don’t always align our software development practices to leverage the study’s lessons and maximize our development efficiency.  Perhaps we believe that the statistics have changed due to language construct, experience, tooling and so forth.  We’d need similar studies to the ones presented by Capers Jones in order to prove that, though.

Below are a few of the actions from the book’s study of defect detection approaches.  I’ve skipped the low end and high-end numbers that Caper’s includes, simply giving the modes (averages) which are a good basis for comparison:

Defect Identification Rates Data Table
Defect Identification Rates Graph

Based on this data, we see that formal reviews of the design and code are much more effective at finding defects than unit testing.  In this case, unit testing is focused on branch coverage.  Obviously, changing the testing expectations, such as applying domain testing concepts, might change the effectiveness. My guess is that such a change would not bring unit testing on par with formal code reviews but would improve its performance, at a cost of productivity – since domain testing takes more rigor to apply.

As Capers points out, these percentages can’t be added together.  In other words, there is an intersection of the defects found by these methods.  In his studies these methods found some of the same and some unique defects.  Therefore, this data does not provide a defense to skip unit testing.  Instead, each defect detection method should be used in a complementary fashion.  The statistical data does inform us regarding level-of-effort decisions.  Also, we can use this information to help define where to cut corners, if necessary.

If we have to reduce the time we spend in the testing phase, we would do better to give up some unit testing in favor of keeping more formal code reviews.  This isn’t always the way we approach such situations.  It is easy to skip the formal reviews, believing that the time would be better spent by developers creating more tests for their code.  Apparently this is not the case.

I think it is valuable to periodically remind ourselves about studies like these as a way to make sure we aren’t falling into bad habits or being misled by incorrect assumptions.  Also, as teams bring on new members, it is good to review our practices, understand the relevant literature and actively discuss why we do what we do (and whether it needs to change).

A parting thought about statistics like the ones quoted here.  They are an average from a very broad array of problem types and development shops.  As Boris Beizer has pointed out, each shop has a “bug fingerprint”.  The bug fingerprint is impacted by things like the technologies being used, types of applications being developed and the experience of the team members.

Further, different types of bugs are better identified through different detection methods (branch tests, domain tests, code inspections, etc.).  Therefore, the optimal set of testing approaches for a given group of developers will differ somewhat from the average.  I’ll discuss this in a future post.

What does your team do in regards to testing and inspections?  Are there other techniques you find useful for identifying software defects?  I welcome your comments and feedback.

Tags: , , , , ,

Leave a Reply

You must be logged in to post a comment.