Cyclomatic Complexity: measure of source code complexity

Cyclomatic Complexity (or Conditional Complexity) (CC) is a means of measuring the complexity of a given program. Complexity is measured by counting ‘decisions’ – the more decisions a program goes through in a single run, the more logical branches it has, and the more complex it becomes.

The number itself is calculated in a more complex way, but as a general rule of thumb, it is the number of total decisions that can be made (logical if, try catches, for loops) plus one.

* CC is ‘total number of decisions’ plus one.

Example 1. Consider the following chunk of code:

To measure cyclomatic complexity, we count the number of logical groups in the code. In this case, the code branches at the <if> statement. If we are to represent this as a flow chart,

 

which can be translated into a graph that looks like this:

 

You can look at it this way. The topmost dot is the start state. It goes to the second dot, which branches or makes a decision(+1) that goes either to the left or the right, but eventually ends up in the same point. Note that branching here is depicted as a dot with two outgoing arrows. Recall that complexity is calculated as decisions + 1, so the function above has complexity 2.

Example 2. On the following example we have 2 if statements.

 

Since we have 2 decisions to make, the function represented by the graph above has complexity 3.

You might be wondering at this point what complexity actually measures. If we were to really count all the paths that the code could potentially go through, that number would be 4. To illustrate, we name the possible ‘sub-paths’ that the code can possible take. Then, we try moving through the graph from start to finish:

Path 1: ABDFGIK

Path 2: ACEFGIK

Path 3: ACEFHJK

Path 4: ABDFHJK

And that number isn’t equal to the number we got from CC. For reference, that number is used in determining ‘path coverage’. We make that distinction between ‘path coverage’ and ‘branch coverage’ when we go into testing.

Testing

So I mentioned ‘path coverage’ and ‘branch coverage’.

‘Path Coverage’ aims to cover all complete paths in a given program, i.e. to test the code given all combinations of true/false for all the if statements – which would result in 2^(n) number of tests, where n is the number of <cascaded, not nested> if statements. That means 32 test cases needed if there are 5 if statements in a given function.

‘Branch Coverage’ aims to ensure that all branches have been tested. As opposed to Path Coverage, Branch Coverage covers that the branches be all tested. Path Coverage covers that all the combinations of those branches be tested.

The biggest difference is that path coverage, due to the sheer number of tests that need to be done, results in prohibitively large amounts of testing that needs to be done.

Branch coverage, on the other hand, is much less prohibitive, as it scales linearly to the number of logical choices that the function needs to make.

This is where CC comes in: As a general rule, we can use CC to determine the number of tests we need to make to ensure complete branch coverage.

tests needed <= CC

This way we directly apply CC as a reference to how many tests we need to make to ensure coverage. Keep in mind that while the number is important, the actual content of each test is more important.

Significance

Functions with high CC are more likely to contain bugs, and given that they are generally more complex, also more difficult to fix.

From the perspective of a developer, it is more difficult to analyze a program with more branches than one with fewer branches – this makes maintaining code and adding features more difficult. As difficulty increases, of course, the more costly it is for everyone involved.

Before going further, however, keep in mind that CC is by no means a one-all determinant to how complex a code base truly is. While it presents a good reference point from which to start refactoring code, not all functions with high cyclomatic complexity are complex, and not all complex functions are faulty. Here’s a good way to understand the significance of cyclomatic complexity:

* Use CC to Compare Code Within – Cyclomatic Complexity is a good determinant of which parts of your code are more likely to contain faults.

* Don’t Use CC to Compare Code Across Projects – It is not a good determinant of which among two different code bases (projects) are more faulty.

* Not All High CC Code is Faulty or Inelegant – This is most likely a pitfall with using this kind of metric, so care must be taken: CC must not be used to measure actual code ‘faultiness’, rather it must be used to measure ‘how likely it is that the code is faulty.’

 

Summary

  • Good Benchmark To Measure How Dangerous A Code Base Is
  • Plays a Major Role Test Driven Development

– Does Not Differentiate ‘Complexity’ Between Switches and Nesting

> code is in reality more complex if logic is nested.

References: