What code coverage IS NOT
To truly understand what code coverage is, it is very important to understand what it is not.
A couple of answers/comments here and on related questions have alluded to this:
Franci Penov
BTW, while code coverage is a good metric of how much testing you are doing, it is not necessarily a good metric of how well you are testing your product.
steve
Just because every line of your code is run at some point in your tests, it doesn't mean you have tested every possible scenario that the code could be run under. If you just had a function that took x and returned x/x and you ran the test using my_func(2) you would have 100% coverage (as the function's code will have been run) but you've missed a huge issue when 0 is the parameter. I.e. you haven't tested all necessary scenarios even with 100% coverage.
KeithS:
However, the flip side of coverage is actually twofold: first, a test that adds coverage for coverage's sake is useless; every test must prove that code works as expected in some novel situation. Also, "coverage" is not "exercise"; your test suites may execute every line of code in the SUT, but they may not prove that a line of logic works in every situation.
No one says it more succinctly and to the point than Mark Simpson:
Code coverage tells you what you definitely haven't tested, not what you have.
An Illustrative Example
I spent some time writing a reply to a feature request that Istanbul (a Javascript test coverage tool) "Change definition of coverage to require more than 1 hit" per line. No one will ever see it there 🤣, so I thought it might be useful to reuse the gist of it here:
A coverage tool CANNOT prove that your code is tested adequately. All it can do is tell you that you provided some kind of coverage for every line of code in your codebase, but even then it doesn't prove the coverage means anything, because a test might execute a line of code without making any assertions on its results. Only you as a developer can decide the actual semantically unique input variations and boundary conditions that need to be covered by tests and ensure that the test logic does in fact make the right assertions.
For example, say you have the following Javascript function. A single test that asserts an input of (1, 1)
returns 1
would give you 100% line coverage. What does that prove?
function max(a, b) {
return a > b ? a : b
}
Putting aside for a moment the semantically poor coverage of this test, the 100% line coverage is rather misleading too, as it doesn't provide 100% branch coverage. That's easily seen by splitting the branches onto different lines and rerunning the line coverage report:
function max(a, b) {
if (a > b) {
return a
} else {
return b
}
}
or even
function max(a, b) {
return a > b ?
a :
b
}
What this tells us is that the "coverage" metric depends too much on the implementation, whereas ideally testing should be black box. And even then it's a judgement call.
For example, would the following three input cases constitute complete testing of the max
function?
You'd get 100% line and 100% branch coverage for the above implementations. But what about non-number inputs? Ok, so you add two more input cases:
which forces you to update the implementation:
function max(a, b) {
if (typeof a !== 'number' || typeof b !== 'number') {
return undefined
}
return a > b ? a : b
}
Looking good. You have 100% line and branch coverage, and you've covered invalid inputs.
But is that enough? What about negative numbers?
The ideal of 100% blackbox coverage is a fantasy
In my opinion, in this situation, for the simple nature of this function, testing negative number cases is anal overkill. If the situation were different, say the function only existed because we need to implemented some tricky algorithm or optimization, that may or may not work as expected for negative numbers, then I'd add more input cases including negative numbers.
Often times, you only discover corner cases because you have hundreds or thousands of users and only through their using your software in unexpected ways or in conditions and software environments you could not foresee or reproduce even if you could are such rare cases exposed. And often those rare cases are artifacts of the nature of your implementation, not something you'd arrive at from analysis of an idealized abstraction of the buggy code's interfaces.
I think what that shows is the ideal of 100% blackbox coverage is a bit of a fantasy. You would waste a lot of time writing unnecessary tests if you treated everything as an idealized black box. In the example above, I know the implementation uses a simple and reliable non-number check and then uses the native Javascript logic to compare values (a > b
), and that it would be silly to do anything more complex. Knowing that, I'm not going to test passing in negative numbers, floats, strings, objects, etc.
At the end of the day, you have to be practical and use good judgement, and that judgement usually cannot ignore knowing something about the nature of what's in the black box, or at least the assumptions made inside the black box.
All this said, I don't have a CS degree 😂. What's the equivalent of IANAL for programmer advice?