Is mutation testing useful in practice?

Asked 28/10, 2008 at 9:29 Answered 27/4, 2023 at 23:40

unit-testing testing code-coverage mutation-testing

Do you have any examples of real life applications of mutation testing? Does it work better than simple test coverage tools? Or is it useless?

What are the advantages/disadvantages of mutation testing in the real world?

Routinize answered 28/10, 2008 at 9:29 Comment(9)

I do not understand how this deviates from traditional test driven development. There's simply no way to cover all mathematical eventualities, and I don't think that's it's even worth it. – Mann 28/10, 2008 at 9:31

Yeah, that's my question if it is worth the effort in the real world. I know there is some theoretical work about it. But does it work in reality? – Routinize 28/10, 2008 at 9:35

Is the point not that mutation testing actually tests the tests? I mean, if you can alter the source code's logic and still pass the tests then surely the tests aren't quite right? Forgive me if I'm missing something... – Walkout 28/10, 2008 at 10:20

Yes, mutation testing like code-coverage check if your tests are sufficient. – Routinize 28/10, 2008 at 10:29

The difference, is that code/branch-coverage might be complete, but your oracles might not, they may not check all conditions even if all lines of the program has been executed. – Tobe 25/11, 2012 at 20:38

Similar question has been asked at sqa.stackexchange.com/questions/5255/…, and answers there say not only about costs of mutation test setup, but also about its effectiveness. – Tobe 26/11, 2012 at 6:42

@Walkout : There's nothing like "a set of tests are right/wrong". A test simply TRIES to detect a deviation of your program from the specified WISH of the programmer. If set of tests still let the modified program pass them, then it means that you are helped with more information so that you can TRY to decrease deviation by adding more tests that don't let the modified program pass. Note that you CANNOT have a "totally right test sets", in the meaning that they will provide zero deviation. – Sturrock 13/1, 2014 at 9:3

@Jon Limjap: 1) About the difference: traditional test driven development simply tries to write tests before each small iteration in writing the software. Mutation testing tries to check if test cases are "good", by modifying the source code. They are two different concepts. 2) You are right that there's no way to cover all eventualities, but adding another different way of testing can help to increase test coverage. – Sturrock 13/1, 2014 at 9:7

I wrote an article explaining why mutation testing is an improvement to code coverage: pedrorijo.com/blog/intro-mutation hope it helps – Grillo 15/2, 2019 at 9:25

The usefulness of unit tests is no longer discussed. They are essential in conception of a quality application. But, how can we assess their relevance? A code coverage indicator up to 100% doesn’t mean the code is 100% tested. This is just a view of executed code during unit tests execution. Mutation testing will allow you to have more confidence in your tests.

This is a two step process:

Generate mutants.
Check that the mutations are found by the tests.

I wrote a entire article about this process, including some concrete cases.

Penrose answered 9/3, 2012 at 13:29 Comment(0)

I looked at mutation test some time ago as a method for checking the efficacy of my automated regression testing scripts. Basically, a number of these scripts had missing checkpoints, so while they were exercising the application being tested correctly, they weren't verifying the results against the baseline data. I found that a far simpler method than changing the code was to write another application to introduce modifications to a copy of the baseline, and re-run the tests against the modified baseline. In this scenario, any test that passed was either faulty or incomplete.

This is not genuine mutation testing, but a method that uses a similar paradigm to test the efficacy of test scripts. It is simple enough to implement, and IMO does a good job.

Simulation answered 28/10, 2008 at 12:11 Comment(2)

How expensive was writing a separate application to verify your tests? Isn't mutation testing supported with tools cheaper? – Tobe 25/11, 2012 at 20:40

Not particularly expensive, about 2 days all in writing tools, and I couldn't find anything off the shelf to do the job. The idea was simply that for all tests that were passing, changing the baseline data should lead to a failure. Where it didn't, it indicated a faulty test case. The actual coding for this was specific to the app being tested, but very simple in what it did. – Simulation 26/11, 2012 at 13:57

I known that this is a old question but recently Uncle Bob write a blog post very interesting about mutating testing that can help understand the usefully of this type of testing:

Uncle Bob mutating testing blog post

Smooth answered 22/6, 2016 at 13:37 Comment(1)

I'm a simple man: I see a link to "uncle Bob", I click :) – Fervency 14/9, 2022 at 12:38

I've played around with pitest for a small, contrived application:

http://pitest.org/

It's a java tool that automates mutant generation. You can run it against your test suite and it'll generate HTML reports for you indicating how many mutants were killed. Seemed quite effective and didn't require much effort to set up. There are actually quite a few nice tools in the Java world for this sort of thing. See also:

http://www.eclemma.org/

For coverage.

I think the concepts behind mutation testing are sound. It's just a matter of tool support and awareness. You're fighting a tradeoff between the simplicity of traditional code coverage metrics and additional complexity of this technique - it really just comes down to tools. If you can generate the mutants, then it will help expose weaknesses in your test cases. Is it worth the marginal increase in effort over the testing you already do? With pitest, I did find it turning up test cases that seemed non-obvious.

Mutation testing is an angle of attack that's quite different from the unit/functional/integration testing methodologies.

You test your test suite - it's a meta-test of your whole testing program.
It inspires additional test cases you might not have otherwise considered.

Doyen answered 10/12, 2012 at 9:4 Comment(0)

Mutation testing has helped me identify problems with test case assertions.

For example, when you get a report that says "no mutant has been killed by test case x", you take a look, and it turns out the assertion had been commented out.

According to this paper, developers at Google use Mutation testing as a complement to code-review and pull-request inspections. They seem happy about the results:

Developers have decided to redesign large chunks of code to make them testable just so a mutant could be killed, they have found bugs in complex logical expressions looking at mutants, they have decided to remove code with an equivalent mutant because they deemed it a premature optimization, they’ve claimed the mutant saved them hours of debugging and even production outages because no test cases were covering the logic under mutation properly. Mutation testing has been called one of the best improvements in the code review verification in years. While this feedback is hardly quantifiable, combined with the sheer number of thousands of developers willing to inspect surfaced mutants on their code changes makes a statement.

Getup answered 1/12, 2018 at 15:57 Comment(0)

I recently did some investigations on mutation testing. Results are here:

http://abeletsky.blogspot.com/2010/07/using-of-mutation-testing-in-real.html

In short: mutation testing could give some information about quality of source code and tests, but it is not something straighforward to use.

Vortumnus answered 11/7, 2010 at 17:24 Comment(2)

You meant "is" or "is NOT" straightforward to use? – Tobe 25/11, 2012 at 21:24

The referred blog does not exist anymore. – Arlynearlynne 31/8, 2015 at 13:2

Mutation testing has helped me in two particular kinds of projects:

Small library developed by myself: I used mutation testing to test the quality of my tests. I discovered that even doing "strict TDD", I had surviving mutants. It helped me understand some anti-patterns in my testing style. I even included mutation testing analysis as part of CI (only when merging to the main branch). But I could do that because the library is tiny and had zero dependencies. The code was simple and fast and all tests were unit tests (about ~300 in total).
Microservice written by a junior team: I was the tech lead on that project, and I suspected the quality of the solution was not good, and the mutation analysis confirmed that hypothesis. The team had little experience writing tests and they missed a lot of cases. I was able to convince managers and developers about the quality of our work by showing the reports and where exactly the mutations were in our project.

In those projects, I've used Stryker (for JS and TS) and I was happy with the results. It helped me show how mutation testing works to people that didn't know about it.

As generating tons of mutations is a pretty CPU-intensive task, it's not something you can do all the time (like running the tests to get immediate feedback), but you can do it after finishing your feature/bugfix/change as a last-minute check before submitting the code. Or if you are in a refactoring sprint/phase, it's a good time to run the tool as well.

It was not helpful in an extensive Rails application that had slow and coupled tests. Basically, every attempt I tried to run a mutation testing tool ended up crashing or returning a huge amount of data that was hard to process. In that case, I did mutation testing manually (generating the mutations by hand) on the critical parts of the code. But this approach is very influenced by your own biases (how do you choose a "good" mutant?).

Compared to test coverage, I tend to say that test coverage is a quantity metric (it says how much code is hit by a test), and mutation score is a quality metric (it says how likely is your code to have bugs caused by changes).

Staggers answered 9/1, 2023 at 15:43 Comment(0)

Coverage vs mutation testing. An old question, but I recently came across a recent blog on the topic. Pretty opinionated. But the differences between coverage and mutation testing is clearly articulated.

https://pedrorijo.com/blog/intro-mutation/

My own experience shows that Pitest is pretty useful, but since the runtime explodes it works only one very fast test sets. In practice this limits where I apply mutation testing.

Heredes answered 17/7, 2019 at 22:7 Comment(2)

Another fun comparison is made in se.ewi.tudelft.nl/ti3115tu-2018/resources/… comparing code to the city, crime to bugs, police to unit testing and fake crime (to test the police) to mutation testing. – Heredes 17/7, 2019 at 22:17

The above comment link no longer works. Also, Pedro had already commented about his blog post under the main question. – Philhellene 11/5, 2021 at 2:12

The test case for the first one behaves differently due to above mutation there is an exception raised now. So it doesn’t returns the expected array of {6,3}. However, our second test case remains same, because it also includes positive number. So, it gives exception on positive numbers as well. Now, if we have to write a successful test case that would be Input ={-6,-6,-7,-3,-4} Expected = {-6,-3}

Liveryman answered 17/12, 2020 at 9:46 Comment(1)

Please edit your answer to improve code formatting. See How to Answer. – Tool 17/12, 2020 at 10:14

I set up mutation testing on Angular using https://stryker-mutator.io/docs/stryker-js/guides/angular/ simply to experiment and it took 2 hours to get a report for a single code file. That said, I was very happy with the experience of using Stryker with .NET. I must admit I am fairly new to mutation testing and there might be better tools that work with Angular/karma but performance is something to keep in mind especially if you plan to use it in conjunction with TDD.

Banas answered 22/9, 2021 at 5:26 Comment(0)

If you accept that

a) Unit tests are necessary

b) Measuring efficacy of unit tests is needed. (This is what code coverage tries to do.)

c) That code coverage alone is a limited metric

d) That a unit test should itself be tested, I.e. be shown fail in appropriate scenarios (note, this is what TDD Red-Green-Refactor tries to achieve)

Then you must accept that mutation testing is necessary.

Wangle answered 27/4, 2023 at 23:40 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags