What is a reasonable code coverage % for unit tests (and why)? [closed]
Asked Answered
P

30

716

If you were to mandate a minimum percentage code-coverage for unit tests, perhaps even as a requirement for committing to a repository, what would it be?

Please explain how you arrived at your answer (since if all you did was pick a number, then I could have done that all by myself ;)

Proffitt answered 18/9, 2008 at 4:25 Comment(5)
Now a days many IDEs comes with coverage highlighting, make sure you covers the most important parts of code at least than thinking of attaining a given percentage.Faience
Unit tests by definition can be individual methods, whole classes, or whole modules. Even if you test all the methods, you might not test all the paths or all the combinations a user will hit. The situation gets more complex with statement, branch coverage and MCDC's.Gerard
Why is this question not deleted or properly edited. It gathered so much interest but it's totally misleading.Gerard
100% coverage is the minimum. I want to know if some punk introduced an unexpected process.exit(1) or throw somewhere just for fun or out of ignorance. If you don't execute each line of code in a build, I simply won't know until maybe at some point in production that code is used.Axiom
I think this might be better thought of inverted. Code coverage tells you very little, except that code was executed. LACK of code coverage on the other hand means that code was NOT EVER executed. So, rather than trying to have lots of code coverage, it's maybe more that we should try to have as little as possible untested code. (The reason for the distinction being that executed code is not necessarily tested code, but unexecuted code is definitely untested code. IE: covered code should not be valued so much as uncovered code avoided.)Okeechobee
H
1619

This prose by Alberto Savoia answers precisely that question (in a nicely entertaining manner at that!):

http://www.artima.com/forums/flat.jsp?forum=106&thread=204677

Testivus On Test Coverage

Early one morning, a programmer asked the great master:

“I am ready to write some unit tests. What code coverage should I aim for?”

The great master replied:

“Don’t worry about coverage, just write some good tests.”

The programmer smiled, bowed, and left.

...

Later that day, a second programmer asked the same question.

The great master pointed at a pot of boiling water and said:

“How many grains of rice should I put in that pot?”

The programmer, looking puzzled, replied:

“How can I possibly tell you? It depends on how many people you need to feed, how hungry they are, what other food you are serving, how much rice you have available, and so on.”

“Exactly,” said the great master.

The second programmer smiled, bowed, and left.

...

Toward the end of the day, a third programmer came and asked the same question about code coverage.

“Eighty percent and no less!” Replied the master in a stern voice, pounding his fist on the table.

The third programmer smiled, bowed, and left.

...

After this last reply, a young apprentice approached the great master:

“Great master, today I overheard you answer the same question about code coverage with three different answers. Why?”

The great master stood up from his chair:

“Come get some fresh tea with me and let’s talk about it.”

After they filled their cups with smoking hot green tea, the great master began to answer:

“The first programmer is new and just getting started with testing. Right now he has a lot of code and no tests. He has a long way to go; focusing on code coverage at this time would be depressing and quite useless. He’s better off just getting used to writing and running some tests. He can worry about coverage later.”

“The second programmer, on the other hand, is quite experience both at programming and testing. When I replied by asking her how many grains of rice I should put in a pot, I helped her realize that the amount of testing necessary depends on a number of factors, and she knows those factors better than I do – it’s her code after all. There is no single, simple, answer, and she’s smart enough to handle the truth and work with that.”

“I see,” said the young apprentice, “but if there is no single simple answer, then why did you answer the third programmer ‘Eighty percent and no less’?”

The great master laughed so hard and loud that his belly, evidence that he drank more than just green tea, flopped up and down.

“The third programmer wants only simple answers – even when there are no simple answers … and then does not follow them anyway.”

The young apprentice and the grizzled great master finished drinking their tea in contemplative silence.

Heins answered 18/9, 2008 at 4:30 Comment(35)
Sounds like an argument against the general concept of code coverage, as a metric for evaluating the usefulness of unit tests. I'm sure everyone agrees it isn't a perfect metric, but personal experience should hopefully show some correlation between CC % and unit test effectiveness...Proffitt
sanity -- your statement is mirrored precisely by the response to the "second developer". Personal experience should dictate it.Heins
I see, so you think that there is no valuable information that can be shared other than "make a decision based on your personal experience"?Proffitt
No, but assigning an arbitrary percentage to a problem that is quite, should I say, malleable, would not be prudent. The "reasonable" part of the question appears to be subjective.Heins
Perfect answer. Metrics do not make good code. You can write crappy code with 100% coverage and it doesn't make the code work good. +1 from me, shame I can't up more :)Schmitz
4 years later, and still useful. Just pulled this on two of my colleagues this morning.Sachikosachs
Savoia later reposted this at googletesting.blogspot.nl/2010/07/…Warhead
To me this anecdote represents an idealistic view. In the real world of project teams with competing priorities, code coverage races to 0%. We need a required number in order to build the unit testing habit within the team. I came to this question looking for some guidance on determining that number for an area I am not very familiar with, and this is really no help at all. I'm glad people in other scenarios are finding it useful though.Congius
it's 2015 and metrics do make good code. they're indicators of good code and while you can write some junk that works it will turn software development into a cost center at some point and you'll get bogged down in maintenance. So yes, metrics don't make good code but they do indicate. Even the infamous lines of code indicates how much your team can output maximum per month.Inspissate
code coverage is only one metric and it depends a lot on your project. I would look more closely on the coverage of your business logic and do additional mutation testing (for Java with PITest) as a second metric. Try to test as much as you can and use Code Quality tools (for Java Findbugs, PMD, ErrorProne) so you can find existing potential problems. This should lead you in the right direction.Chrisy
I ran a test coverage on my code for the "first" time in my life and found out that my tests do not cover all. I was so ready to test "set" functionality of my property. I now know that I don't need to.Scuff
Instead of following an absolute code coverage percentage, many track the change over time. Suppose you have 20% coverage from tests. While that number may or may not be satisfactory at the time, if the number changes downward,it can provoke questions about code quality.Silverstein
A young student came to Pythagoras. "I have a right triangle," he asked, "but how may I know the length of its longest side?" Pythagoras, who was very wise and cool, merely laughed. "This I cannot tell you," he proclaimed, "for it depends on the triangle!"Rowan
@Rowan You do know there's a difference between mathematical axioms and things that are answerable by "it depends", right? :pHeins
@JonLimjap - There are differences and there are commonalities. One commonality is that both problems have dependencies. It is correct in both cases to answer "it depends." It is as correct and more complete, in both cases, to answer with an explanation of what those dependencies are and how they influence the result.Rowan
@Proffitt it's an argument that relative code coverage (it went up this week, it went down... why???) ... makes sense as a metric of where you are now vs where you were before. i'm working on a project that contains a LOT of awful, untestable auto-generated boiler plate code.Keyte
CC% works pretty well with code reviews. Don't let junk test in. As a reviewer you should probably look at the test first to see what the code is supposed to be doing. From there you can judge the code. If I had to say I'd say this about CC% (number directly out of my bum) 30%, probably not good. 50% means you are probably catching some stuff, you're making an effort. 75% much better this might be a good target. 90% very cool, probably some waist. 100% you may be cheating or your coverage plugin is broken.Eld
I really challenge the person who wrote this that why it is valid to write some code that cannot be tested, whether it is DTO or anything else.Bleier
'Smoking hot green tea' would be (arguably) undrinkable and taste very bitter. Temperature is crucial. thefragrantleaf.com/green-tea-brewing-tipsGyn
It may be the case that good coverage doesn't guarantee good code or tests but some people use this fact as an excuse to get away with low coverage which indicates lack of tests, and that is clearly worse than a few tests. So why not aim for both, good coverage and good tests? E.g. based on expected features: ronjeffries.com/xprog/articles/jatrtsmetricKhalilahkhalin
Code coverage means nothing. Without proper assertions after test run, or proper setup of test environment, it is useless. Besides, you don't test lines of code -- you test functionality. As such, tests should not be tied to the implementation but check the desired functionality.Eighteenth
moral of the story: 80% !Vivisect
@Rowan I think the answer to your doubts is there in the text. If you're just getting started with testing, then focusing on code coverage would make you more harm than good. If you are more experienced, then measuring code coverage will give you some useful information about general health of your system and how it changes over time, but it will not give you strict recommendations (it's your code after all and you should know better how many tests are enough). Personally, I like it this way, it shows that Programming is also a craft and a sort of art, not only following recipies.Whole
In my experience, Code Coverage works once to test that what I write does what it should, but many to ensure that what I wrote still does what it should after someone (me included) changed the code due to new requisites. And to those who think this "proves" they can have low coverage, the lower the coverage, the fewer your tests, the fewer your tests, the fewer your good tests. You can still have crappy code with 100% coverage, but you can have much crappier code when allowed less coverageDarrelldarrelle
Just the concept of having huge chunks of code that are not covered by the specs if pretty crazy! (e.g.: DTO)Sanitary
@grasevski That's because person 3 makes him feel inadequate, so he overcompensates.Lissa
Your short story helped me aligning JUnit thoughts in my team. Thanks a lot!Pinion
The coverage is just that - indication that a given line of code was executed by the runtime. It is not an indication that that particular line of code was de-facto tested. For this one needs solid understanding and application of the behavioural assertions which will prove that the code does what it was meant to do.Insphere
100% coverage is easily achieved, but with poor quality assertions the tests are utterly useless as a proofing tool. Religious worship of "high code coverage" can in fact be dangerous as it leads to false sense of security when there may not be any at all. "Write some good tests" is probably the best philosophy one can adhere to - be it a novice or 30-year code veteran.Insphere
I find it hilarious that people react to this post with a discussion about how many tests one should write - that perfectly breaks the fourth wall 😂Very
Metrics won't make your app solid, great tests will do. Metrics in software development are just a guide for you to know where you can improve and more important: You should not relay on them to determine if it's quality application or not.Charitacharitable
Second thought, though all truth here, isn't this the answer to any question? We can't approach questions in this fashion :)Laresa
10 years later, the advise of the old master still holds :)Michael
Still a good advice in 2020!Forrestforrester
its an argument against metrics in general. The metrics influence the behaviour and so do not deliver what is really needed: not high quality code, just well covered codePacheco
C
103

Code Coverage is a misleading metric if 100% coverage is your goal (instead of 100% testing of all features).

  • You could get a 100% by hitting all the lines once. However you could still miss out testing a particular sequence (logical path) in which those lines are hit.
  • You could not get a 100% but still have tested all your 80%/freq used code-paths. Having tests that test every 'throw ExceptionTypeX' or similar defensive programming guard you've put in is a 'nice to have' not a 'must have'

So trust yourself or your developers to be thorough and cover every path through their code. Be pragmatic and don't chase the magical 100% coverage. If you TDD your code you should get a 90%+ coverage as a bonus. Use code-coverage to highlight chunks of code you have missed (shouldn't happen if you TDD though.. since you write code only to make a test pass. No code can exist without its partner test. )

Coraliecoraline answered 18/9, 2008 at 4:33 Comment(9)
- Exceptions- if you don't test your exception handling how do you know your code doesn't blow up when that happens? - Setters/Getters - context sensitive I suppose, but surely your tests should execute them as part of the test suite, and if they don't are they actually being used?Beautiful
Exceptions should be exceptional - not supposed to happen. If they do, you log the point of failure and bail. You can't test every exception that could happen. If the app is supposed to handle a non-happy path/event, you should have a test for it. Accessors may be added for future clients.. dependsCoraliecoraline
I take back my words: Exceptions need not be exceptional - a method may throw an exception to indicate that it could not carry out its work to completion. What I meant was you don't need a test for every method where you've put a defensive just-in-case try-catch. All known failure modes of a method must be tested with automated UTsCoraliecoraline
I'm not sure what you mean by your second point "but still have tested all your code-paths". If you in fact mean full-path coverage, then no you cannot have full-path coverage without 100% line/branch/decision coverage. In fact, full-path coverage is usually unobtainable in any non-trivial program because of the combinatoric nature of branches in generating paths. en.wikipedia.org/wiki/Code_coverage#Other_coverage_criteriaVicariate
@Zach - my mistake. I should have said "all code paths that you want to work" or "non-fringe paths". If the path failure is a very rare possibility I would be okay with it not being covered.Coraliecoraline
You don't test every possible exception; of course you can't do that. You SHOULD aim to test every block of code that handles exceptions. For example, if you have a requirement that when block X throws an exception, the exception is logged in the database, the green stripe at the bottom of the screen turns red, and an email is sent to the Pope; then that is what you should test. But you don't have to test every possible exception that might trigger these events.Accipiter
+1 for "Use code-coverage to highlight chunks of code you have missed". That's basically what that metric is good for.Vanguard
Coming back to this question after 7 years 0- what my comment left in the gaps was this. If you throw a custom exception, of course you should test that - it is part of your contract with your client. But you don't write tests for out of the blue Null Reference Exception, OutOfMemoryException etc..- they could happen n you may put in generic catch handlers for these (to log the failure for troubleshooting) but I don't generally write tests for these. I'm okay with it being non covered as long it's a simple/small generic catch-log-rethrow handlerCoraliecoraline
it needs to be a loose goal - 10% coverage shows you've got very hight change of problems but trying to get to 100% is going to be a lot of work with diminishing gains. The magical figure depends on the team, the software and numerous other factors.Pacheco
R
89

Jon Limjap makes a good point - there is not a single number that is going to make sense as a standard for every project. There are projects that just don't need such a standard. Where the accepted answer falls short, in my opinion, is in describing how one might make that decision for a given project.

I will take a shot at doing so. I am not an expert in test engineering and would be happy to see a more informed answer.

When to set code coverage requirements

First, why would you want to impose such a standard in the first place? In general, when you want to introduce empirical confidence in your process. What do I mean by "empirical confidence"? Well, the real goal correctness. For most software, we can't possibly know this across all inputs, so we settle for saying that code is well-tested. This is more knowable, but is still a subjective standard: It will always be open to debate whether or not you have met it. Those debates are useful and should occur, but they also expose uncertainty.

Code coverage is an objective measurement: Once you see your coverage report, there is no ambiguity about whether standards have been met are useful. Does it prove correctness? Not at all, but it has a clear relationship to how well-tested the code is, which in turn is our best way to increase confidence in its correctness. Code coverage is a measurable approximation of immeasurable qualities we care about.

Some specific cases where having an empirical standard could add value:

  • To satisfy stakeholders. For many projects, there are various actors who have an interest in software quality who may not be involved in the day-to-day development of the software (managers, technical leads, etc.) Saying "we're going to write all the tests we really need" is not convincing: They either need to trust entirely, or verify with ongoing close oversight (assuming they even have the technical understanding to do so.) Providing measurable standards and explaining how they reasonably approximate actual goals is better.
  • To normalize team behavior. Stakeholders aside, if you are working on a team where multiple people are writing code and tests, there is room for ambiguity for what qualifies as "well-tested." Do all of your colleagues have the same idea of what level of testing is good enough? Probably not. How do you reconcile this? Find a metric you can all agree on and accept it as a reasonable approximation. This is especially (but not exclusively) useful in large teams, where leads may not have direct oversight over junior developers, for instance. Networks of trust matter as well, but without objective measurements, it is easy for group behavior to become inconsistent, even if everyone is acting in good faith.
  • To keep yourself honest. Even if you're the only developer and only stakeholder for your project, you might have certain qualities in mind for the software. Instead of making ongoing subjective assessments about how well-tested the software is (which takes work), you can use code coverage as a reasonable approximation, and let machines measure it for you.

Which metrics to use

Code coverage is not a single metric; there are several different ways of measuring coverage. Which one you might set a standard upon depends on what you're using that standard to satisfy.

I'll use two common metrics as examples of when you might use them to set standards:

  • Statement coverage: What percentage of statements have been executed during testing? Useful to get a sense of the physical coverage of your code: How much of the code that I have written have I actually tested?
    • This kind of coverage supports a weaker correctness argument, but is also easier to achieve. If you're just using code coverage to ensure that things get tested (and not as an indicator of test quality beyond that) then statement coverage is probably sufficient.
  • Branch coverage: When there is branching logic (e.g. an if), have both branches been evaluated? This gives a better sense of the logical coverage of your code: How many of the possible paths my code may take have I tested?
    • This kind of coverage is a much better indicator that a program has been tested across a comprehensive set of inputs. If you're using code coverage as your best empirical approximation for confidence in correctness, you should set standards based on branch coverage or similar.

There are many other metrics (line coverage is similar to statement coverage, but yields different numeric results for multi-line statements, for instance; conditional coverage and path coverage is similar to branch coverage, but reflect a more detailed view of the possible permutations of program execution you might encounter.)

What percentage to require

Finally, back to the original question: If you set code coverage standards, what should that number be?

Hopefully it's clear at this point that we're talking about an approximation to begin with, so any number we pick is going to be inherently approximate.

Some numbers that one might choose:

  • 100%. You might choose this because you want to be sure everything is tested. This doesn't give you any insight into test quality, but does tell you that some test of some quality has touched every statement (or branch, etc.) Again, this comes back to degree of confidence: If your coverage is below 100%, you know some subset of your code is untested.
    • Some might argue that this is silly, and you should only test the parts of your code that are really important. I would argue that you should also only maintain the parts of your code that are really important. Code coverage can be improved by removing untested code, too.
  • 99% (or 95%, other numbers in the high nineties.) Appropriate in cases where you want to convey a level of confidence similar to 100%, but leave yourself some margin to not worry about the occasional hard-to-test corner of code.
  • 80%. I've seen this number in use a few times, and don't entirely know where it originates. I think it might be a weird misappropriation of the 80-20 rule; generally, the intent here is to show that most of your code is tested. (Yes, 51% would also be "most", but 80% is more reflective of what most people mean by most.) This is appropriate for middle-ground cases where "well-tested" is not a high priority (you don't want to waste effort on low-value tests), but is enough of a priority that you'd still like to have some standard in place.

I haven't seen numbers below 80% in practice, and have a hard time imagining a case where one would set them. The role of these standards is to increase confidence in correctness, and numbers below 80% aren't particularly confidence-inspiring. (Yes, this is subjective, but again, the idea is to make the subjective choice once when you set the standard, and then use an objective measurement going forward.)

Other notes

The above assumes that correctness is the goal. Code coverage is just information; it may be relevant to other goals. For instance, if you're concerned about maintainability, you probably care about loose coupling, which can be demonstrated by testability, which in turn can be measured (in certain fashions) by code coverage. So your code coverage standard provides an empirical basis for approximating the quality of "maintainability" as well.

Rowan answered 9/1, 2016 at 20:44 Comment(5)
Good answer. Can you help me in finding functionality coverage via unit tests? Any tool(s) that can help me achieve this?Collazo
Great answer. It's the only one that focuses on testing as a team problem in an industrial setting. I don't get to review everything and my team is very bright, but green. I set a percentage floor of 90% on a new project as a sanity check for junior devs, not because I believe it is "enough". "90%" and "positive, negative, and null" are easy mantras for bright, young developers who I know will do a good job, but don't have the experience to go ahead and write that extra test case that's nagging at the back of your mind.Quirinal
i think this is the best answer available.Ohare
I believe the 80% number comes from Martin Fowlers article on the subject martinfowler.com/bliki/TestCoverage.htmlMimicry
To reach ±50% of coverage you just need to write any test which just calls stuff w/o any insight, 100% coverage i see as marketing for open source and ±80% is a number i personally came to while trying to estimate the minimum requirement. To reach 80% you actually need to write some details. If details picked well it will cover most of stuff. Every % post that will require more effort. That's how i picked 80-85% and now googling what others pick. I'm not aware of Martin, thanks for sharing.Maturation
W
73

Code coverage is great, but functionality coverage is even better. I don't believe in covering every single line I write. But I do believe in writing 100% test coverage of all the functionality I want to provide (even for the extra cool features I came with myself and which were not discussed during the meetings).

I don't care if I would have code which is not covered in tests, but I would care if I would refactor my code and end up having a different behaviour. Therefore, 100% functionality coverage is my only target.

Worthen answered 27/4, 2009 at 22:56 Comment(4)
This is a fantastic answer. Code that meets its requirements is a far more worthwhile goal than code that meets some arbitrary LoC coverage metric.Accipiter
If you can provide all functionality without hitting all the lines of code, then what are those extra lines of code doing there?Invertase
@JensTimmerman theoretically you're right. However, 100% code coverage is too expensive time-wise, and forcing my team to do that not only demotivates them, but also makes my project run over the deadline. I like to be somewhere in the middle, and testing functionality (call it: integration testing) is what I feel comfortable with. What code I don't test? Technical exception handling, (range/parameter) checks that could be needed. In short, all technical plumbing that I learned to apply from own experience or best practices I read about.Worthen
I took this a step further by making a list of common situations that should be either included or excluded from testing. That way, we were never driving towards a percent, but rather functional coverage of all parts of the working codebase.Breeding
S
34

My favorite code coverage is 100% with an asterisk. The asterisk comes because I prefer to use tools that allow me to mark certain lines as lines that "don't count". If I have covered 100% of the lines which "count", I am done.

The underlying process is:

  1. I write my tests to exercise all the functionality and edge cases I can think of (usually working from the documentation).
  2. I run the code coverage tools
  3. I examine any lines or paths not covered and any that I consider not important or unreachable (due to defensive programming) I mark as not counting
  4. I write new tests to cover the missing lines and improve the documentation if those edge cases are not mentioned.

This way if I and my collaborators add new code or change the tests in the future, there is a bright line to tell us if we missed something important - the coverage dropped below 100%. However, it also provides the flexibility to deal with different testing priorities.

Shocking answered 7/10, 2014 at 15:58 Comment(2)
Would you care to include the "tools that allow [you] to mark certain lines as lines that don't count"?Ursas
@Ursas As an example in PHP, if using Bergmann's code coverage library, annotate a line with // @codeCoverageIgnore and it'll be excluded from coverage.Ashti
H
23

I'd have another anectode on test coverage I'd like to share.

We have a huge project wherein, over twitter, I noted that, with 700 unit tests, we only have 20% code coverage.

Scott Hanselman replied with words of wisdom:

Is it the RIGHT 20%? Is it the 20% that represents the code your users hit the most? You might add 50 more tests and only add 2%.

Again, it goes back to my Testivus on Code Coverage Answer. How much rice should you put in the pot? It depends.

Heins answered 18/9, 2008 at 4:42 Comment(3)
Obviously there has to be common sense in there. Its not much use if the 50% of the code you are testing are comments.Proffitt
It's more in the lines of... is your coverage spent on your application's core functionality, or is it uselessly testing trivial features/nice-to-haves?Heins
sounds like a large % of your code is either boilerplate, or exception handling, or conditional "debug mode" stuffKeyte
T
12

Many shops don't value tests, so if you are above zero at least there is some appreciation of worth - so arguably non-zero isn't bad as many are still zero.

In the .Net world people often quote 80% as reasonble. But they say this at solution level. I prefer to measure at project level: 30% might be fine for UI project if you've got Selenium, etc or manual tests, 20% for the data layer project might be fine, but 95%+ might be quite achievable for the business rules layer, if not wholly necessary. So the overall coverage may be, say, 60%, but the critical business logic may be much higher.

I've also heard this: aspire to 100% and you'll hit 80%; but aspire to 80% and you'll hit 40%.

Bottom line: Apply the 80:20 rule, and let your app's bug count guide you.

Trier answered 29/7, 2016 at 23:50 Comment(1)
From a DDD perspective the high(est) aim for business logic is very reasonable. Detecting the slightest change in business logic behavior is crucial.Noisy
N
9

For a well designed system, where unit tests have driven the development from the start i would say 85% is a quite low number. Small classes designed to be testable should not be hard to cover better than that.

It's easy to dismiss this question with something like:

  • Covered lines do not equal tested logic and one should not read too much into the percentage.

True, but there are some important points to be made about code coverage. In my experience this metric is actually quite useful, when used correctly. Having said that, I have not seen all systems and i'm sure there are tons of them where it's hard to see code coverage analysis adding any real value. Code can look so different and the scope of the available test framework can vary.

Also, my reasoning mainly concerns quite short test feedback loops. For the product that I'm developing the shortest feedback loop is quite flexible, covering everything from class tests to inter process signalling. Testing a deliverable sub-product typically takes 5 minutes and for such a short feedback loop it is indeed possible to use the test results (and specifically the code coverage metric that we are looking at here) to reject or accept commits in the repository.

When using the code coverage metric you should not just have a fixed (arbitrary) percentage which must be fulfilled. Doing this does not give you the real benefits of code coverage analysis in my opinion. Instead, define the following metrics:

  • Low Water Mark (LWM), the lowest number of uncovered lines ever seen in the system under test
  • High Water Mark (HWM), the highest code coverage percentage ever seen for the system under test

New code can only be added if we don't go above the LWM and we don't go below the HWM. In other words, code coverage is not allowed to decrease, and new code should be covered. Notice how i say should and not must (explained below).

But doesn't this mean that it will be impossible to clean away old well-tested rubbish that you have no use for anymore? Yes, and that's why you have to be pragmatic about these things. There are situations when the rules have to be broken, but for your typical day-to-day integration my experience it that these metrics are quite useful. They give the following two implications.

  • Testable code is promoted. When adding new code you really have to make an effort to make the code testable, because you will have to try and cover all of it with your test cases. Testable code is usually a good thing.

  • Test coverage for legacy code is increasing over time. When adding new code and not being able to cover it with a test case, one can try to cover some legacy code instead to get around the LWM rule. This sometimes necessary cheating at least gives the positive side effect that the coverage of legacy code will increase over time, making the seemingly strict enforcement of these rules quite pragmatic in practice.

And again, if the feedback loop is too long it might be completely unpractical to setup something like this in the integration process.

I would also like to mention two more general benefits of the code coverage metric.

  • Code coverage analysis is part of the dynamic code analysis (as opposed to the static one, i.e. Lint). Problems found during the dynamic code analysis (by tools such as the purify family, http://www-03.ibm.com/software/products/en/rational-purify-family) are things like uninitialized memory reads (UMR), memory leaks, etc. These problems can only be found if the code is covered by an executed test case. The code that is the hardest to cover in a test case is usually the abnormal cases in the system, but if you want the system to fail gracefully (i.e. error trace instead of crash) you might want to put some effort into covering the abnormal cases in the dynamic code analysis as well. With just a little bit of bad luck, a UMR can lead to a segfault or worse.

  • People take pride in keeping 100% for new code, and people discuss testing problems with a similar passion as other implementation problems. How can this function be written in a more testable manner? How would you go about trying to cover this abnormal case, etc.

And a negative, for completeness.

  • In a large project with many involved developers, everyone is not going to be a test-genius for sure. Some people tend to use the code coverage metric as proof that the code is tested and this is very far from the truth, as mentioned in many of the other answers to this question. It is ONE metric that can give you some nice benefits if used properly, but if it is misused it can in fact lead to bad testing. Aside from the very valuable side effects mentioned above a covered line only shows that the system under test can reach that line for some input data and that it can execute without hanging or crashing.
Nutritionist answered 25/7, 2014 at 7:45 Comment(0)
U
7

If this were a perfect world, 100% of code would be covered by unit tests. However, since this is NOT a perfect world, it's a matter of what you have time for. As a result, I recommend focusing less on a specific percentage, and focusing more on the critical areas. If your code is well-written (or at least a reasonable facsimile thereof) there should be several key points where APIs are exposed to other code.

Focus your testing efforts on these APIs. Make sure that the APIs are 1) well documented and 2) have test cases written that match the documentation. If the expected results don't match up with the docs, then you have a bug in either your code, documentation, or test cases. All of which are good to vet out.

Good luck!

Unmarked answered 18/9, 2008 at 4:30 Comment(0)
C
6

Code coverage is just another metric. In and of itself, it can be very misleading (see www.thoughtworks.com/insights/blog/are-test-coverage-metrics-overrated). Your goal should therefore not be to achieve 100% code coverage but rather to ensure that you test all relevant scenarios of your application.

Cheeseparing answered 25/9, 2014 at 19:0 Comment(0)
M
5

I prefer to do BDD, which uses a combination of automated acceptance tests, possibly other integration tests, and unit tests. The question for me is what the target coverage of the automated test suite as a whole should be.

That aside, the answer depends on your methodology, language and testing and coverage tools. When doing TDD in Ruby or Python it's not hard to maintain 100% coverage, and it's well worth doing so. It's much easier to manage 100% coverage than 90-something percent coverage. That is, it's much easier to fill coverage gaps as they appear (and when doing TDD well coverage gaps are rare and usually worth your time) than it is to manage a list of coverage gaps that you haven't gotten around to and miss coverage regressions due to your constant background of uncovered code.

The answer also depends on the history of your project. I've only found the above to be practical in projects managed that way from the start. I've greatly improved the coverage of large legacy projects, and it's been worth doing so, but I've never found it practical to go back and fill every coverage gap, because old untested code is not well understood enough to do so correctly and quickly.

Merideth answered 12/5, 2016 at 14:5 Comment(0)
I
4

85% would be a good starting place for checkin criteria.

I'd probably chose a variety of higher bars for shipping criteria - depending on the criticality of the subsystems/components being tested.

Intoxicated answered 18/9, 2008 at 4:27 Comment(4)
How did you arrive at that percentage?Proffitt
As a footnote - this can be messy for projects where automation is difficult - as always be pragmatic about what is achievable vs. desireable.Intoxicated
Mainly through experimentation. It is pretty easy to get to code coverage to 80-90% for Dev-related unit tests - going higher normally needs divine test intervention - or really simple code paths.Intoxicated
I start usually with 1) major runtime code paths 2) obvious exception cases that I explicitly throw 3) conditional cases that terminate with "failure" This gets you usually into the 70-80 range Then wackamole, bugs and regressions for corner cases, parameter fuzzing etc. Refactoring to enable injection of methods etc. I generally allow at least as much time for writing/refactoring dev-related tests as the main code itself.Intoxicated
H
4

Code coverage is great but only as long as the benefits that you get from it outweigh the cost/effort of achieving it.

We have been working to a standard of 80% for some time, however we have just made the decison to abandon this and instead be more focused on our testing. Concentrating on the complex business logic etc,

This decision was taken due to the increasing amount of time we spent chasing code coverage and maintaining existing unit tests. We felt we had got to the point where the benefit we were getting from our code coverage was deemed to be less than the effort that we had to put in to achieve it.

Headquarters answered 19/9, 2008 at 15:23 Comment(0)
M
4

I use cobertura, and whatever the percentage, I would recommend keeping the values in the cobertura-check task up-to-date. At the minimum, keep raising totallinerate and totalbranchrate to just below your current coverage, but never lower those values. Also tie in the Ant build failure property to this task. If the build fails because of lack of coverage, you know someone's added code but hasn't tested it. Example:

<cobertura-check linerate="0"
                 branchrate="0"
                 totallinerate="70"
                 totalbranchrate="90"
                 failureproperty="build.failed" />
Matron answered 27/4, 2009 at 23:29 Comment(2)
what if someone put in a PR that just removes dead code, but that dead code had a better rate of coverage than the project as a whole?Marinate
If removing that one piece of dead code drops you below the line, I'd say that a huge hint that you need to beef up your coverage elsewhere and should take the time to do that right then and there.Matron
C
4

When I think my code isn't unit tested enough, and I'm not sure what to test next, I use coverage to help me decide what to test next.

If I increase coverage in a unit test - I know this unit test worth something.

This goes for code that is not covered, 50% covered or 97% covered.

Cooley answered 19/5, 2010 at 15:34 Comment(5)
I disagree completely. A unit test is only worth something if there's a chance that it will uncover a bug, (either a bug that exists now or a regression bug in the future); or if it helps to document the behaviour of your class. If a method is so simple that it can't really fail, such as a one-line getter, then there is zero value in providing a unit test for it.Accipiter
I had bugs in one line getters. From my experience, there's no bug free code. There's no method that can't really fail.Cooley
Assuming your one-line getter is used by other code that you do cover, and the tests of that code pass, then you've also indirectly covered the one-line getter. If your aren't using the getter, what's it doing in your code? I agree with David Wallace… there is no need to directly test simple helper functions that are used elsewhere if the code and tests which depend on the helper don't show there might be a problem with it.Discordancy
@LowellMontgomery and what if the test for your other code fails because of that one-line getter (that was not tested)? If there was a test in place for the one-liner, it would be much easier to get to the cause of the fail. It gets really bad when you have hundreds of not tested one-liners being used in several different places.Highmuckamuck
The assumption was the tests using the one-line getter passed. If it failed (e.g. where you try to use the return value of your one-line getter), then you can sort it out. But unless there is a really pressing reason for being so paranoid, you have to draw the line somewhere. My experience has been that I need to prioritize what sucks my time and attention and really simple "getters" (that work) don't need separate tests. That time can be spent on making other tests better or more full coverage of code that is more likely to fail. (i.e. I stand by my original position, with David Wallace).Discordancy
C
3

Short answer: 60-80%

Long answer: I think it totally depends on the nature of your project. I typically start a project by unit testing every practical piece. By the first "release" of the project you should have a pretty good base percentage based on the type of programming you are doing. At that point you can start "enforcing" a minimum code coverage.

Comose answered 18/9, 2008 at 4:31 Comment(0)
A
3

If you've been doing unit testing for a decent amount of time, I see no reason for it not to be approaching 95%+. However, at a minimum, I've always worked with 80%, even when new to testing.

This number should only include code written in the project (excludes frameworks, plugins, etc.) and maybe even exclude certain classes composed entirely of code written of calls to outside code. This sort of call should be mocked/stubbed.

Arlina answered 18/9, 2008 at 4:35 Comment(0)
C
3

Generally speaking, from the several engineering excellence best practices papers that I have read, 80% for new code in unit tests is the point that yields the best return. Going above that CC% yields a lower amount of defects for the amount of effort exerted. This is a best practice that is used by many major corporations.

Unfortunately, most of these results are internal to companies, so there are no public literatures that I can point you to.

Chromic answered 18/9, 2008 at 4:53 Comment(0)
N
3

My answer to this conundrum is to have 100% line coverage of the code you can test and 0% line coverage of the code you can't test.

My current practice in Python is to divide my .py modules into two folders: app1/ and app2/ and when running unit tests calculate the coverage of those two folders and visually check (I must automate this someday) that app1 has 100% coverage and app2 has 0% coverage.

When/if I find that these numbers differ from standard I investigage and alter the design of the code so that coverage conforms to the standard.

This does mean that I can recommend achieving 100% line coverage of library code.

I also occasionally review app2/ to see if I could possible test any code there, and If I can I move it into app1/

Now I'm not too worried about the aggregate coverage because that can vary wildly depending on the size of the project, but generally I've seen 70% to over 90%.

With python, I should be able to devise a smoke test which could automatically run my app while measuring coverage and hopefully gain an aggreagate of 100% when combining the smoke test with unittest figures.

Natividad answered 19/9, 2008 at 10:11 Comment(0)
P
2

Check out Crap4j. It's a slightly more sophisticated approach than straight code coverage. It combines code coverage measurements with complexity measurements, and then shows you what complex code isn't currently tested.

Prizefight answered 18/9, 2008 at 20:0 Comment(0)
W
2

Viewing coverage from another perspective: Well-written code with a clear flow of control is the easiest to cover, the easiest to read, and usually the least buggy code. By writing code with clearness and coverability in mind, and by writing the unit tests in parallel with the code, you get the best results IMHO.

Whensoever answered 31/1, 2009 at 11:16 Comment(0)
H
2

In my opinion, the answer is "It depends on how much time you have". I try to achieve 100% but I don't make a fuss if I don't get it with the time I have.

When I write unit tests, I wear a different hat compared to the hat I wear when developing production code. I think about what the tested code claims to do and what are the situations that can possible break it.

I usually follow the following criteria or rules:

  1. That the Unit Test should be a form of documentation on what's the expected behavior of my codes, ie. the expected output given a certain input and the exceptions it may throw that clients may want to catch (What the users of my code should know?)

  2. That the Unit Test should help me discover the what if conditions that I may not yet have thought of. (How to make my code stable and robust?)

If these two rules doesn't produce 100% coverage then so be it. But once, I have the time, I analyze the uncovered blocks and lines and determine if there are still test cases without unit tests or if the code needs to be refactored to eliminate the unecessary codes.

Hydrophone answered 14/8, 2011 at 15:13 Comment(0)
S
1

It depends greatly on your application. For example, some applications consist mostly of GUI code that cannot be unit tested.

Stairway answered 18/9, 2008 at 4:29 Comment(1)
You should probably use Model View Presenter for you UI if you are in a TDD enviernment.Jezabelle
A
1

I don't think there can be such a B/W rule.
Code should be reviewed, with particular attention to the critical details.
However, if it hasn't been tested, it has a bug!

Ainslee answered 18/9, 2008 at 4:30 Comment(1)
Don't want a rule, just feedback on any personal experience on the correlation between code-coverage percentage and unit test effectiveness.Proffitt
B
1

Depending on the criticality of the code, anywhere from 75%-85% is a good rule of thumb. Shipping code should definitely be tested more thoroughly than in house utilities, etc.

Bluepoint answered 18/9, 2008 at 4:31 Comment(0)
S
1

This has to be dependent on what phase of your application development lifecycle you are in.

If you've been at development for a while and have a lot of implemented code already and are just now realizing that you need to think about code coverage then you have to check your current coverage (if it exists) and then use that baseline to set milestones each sprint (or an average rise over a period of sprints), which means taking on code debt while continuing to deliver end user value (at least in my experience the end user doesn't care one bit if you've increased test coverage if they don't see new features).

Depending on your domain it's not unreasonable to shoot for 95%, but I'd have to say on average your going to be looking at an average case of 85% to 90%.

Swaraj answered 18/9, 2008 at 4:33 Comment(0)
V
1

I think the best symptom of correct code coverage is that amount of concrete problems unit tests help to fix is reasonably corresponds to size of unit tests code you created.

Valorievalorization answered 18/9, 2008 at 4:34 Comment(0)
G
1

I think that what may matter most is knowing what the coverage trend is over time and understanding the reasons for changes in the trend. Whether you view the changes in the trend as good or bad will depend upon your analysis of the reason.

Geisler answered 27/4, 2009 at 22:49 Comment(0)
P
0

We were targeting >80% till few days back, But after we used a lot of Generated code, We do not care for %age, but rather make reviewer take a call on the coverage required.

Precess answered 18/9, 2008 at 4:32 Comment(0)
C
0

From the Testivus posting I think the answer context should be the second programmer.

Having said this from a practical point of view we need parameter / goals to strive for.

I consider that this can be "tested" in an Agile process by analyzing the code we have the architecture, functionality (user stories), and then come up with a number. Based on my experience in the Telecom area I would say that 60% is a good value to check.

Clockwork answered 13/3, 2013 at 17:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.