Detecting 'dead' tests and hardcoded data vs constrained non-determinism
Asked Answered
N

2

6

For those who are not sure what is meant by 'constrained non-determinism' I recommend Mark Seeman's post.

The essence of the idea is the test having deterministic values only for data affecting SUT behavior. Not 'relevant' data can be to some extent 'random'.

I like this approach. The more data is abstract the more clear and expressive expectations become and indeed it becomes harder to unconsciously fit data to the test.

I'm trying to 'sell' this approach (along with AutoFixture) to my colleagues and yesterday we had a long debate about it.
They proposed interesting argument about not stable hard to debug tests due to random data.
At first that seemed bit strange because as we all agreed that flow affecting data mustn't be random and such behavior is not possible. Nonetheless I took a break to thoroughly think over that concern. And I finally came to the following problem:

But some of my assumptions first:

  1. Test code MUST be treated as production code.
  2. Test code MUST must express correct expectations and specifications of system behavior.
  3. Nothing warns you about inconsistencies better than broken build (either not compiled or just failed tests - gated check-in).

consider these two variants of the same test:

[TestMethod]
public void DoSomethig_RetunrsValueIncreasedByTen()
{
    // Arrange
    ver input = 1;
    ver expectedOutput = input+10;

    var sut = new MyClass();

    // Act
    var actualOuptut = sut.DoeSomething(input);

    // Assert
    Assert.AreEqual(expectedOutput,actualOutput,"Unexpected return value.");
}

/// Here nothing is changed besides input now is random.
[TestMethod]
public void DoSomethig_RetunrsValueIncreasedByTen()
{
    // Arrange
    var fixture = new Fixture();
    ver input = fixture.Create<int>();
    ver expectedOutput = input+10;

    var sut = new MyClass();

    // Act
    var actualOuptut = sut.DoeSomething(input);

    // Assert
    Assert.AreEqual(expectedOutput,actualOutput,"Unexpected return value.");
}

So far so god, everything works and life is beautiful but then requirements change and DoSomething changes its behavior: now it increases input only if it'is lower than 10, and multiplies by 10 otherwise. What happens here? The test with hardcoded data passes (actually accidentally), whereas the second test fails sometimes. And they both are wrong deceiving tests: they check nonexistent behavior.

Looks like it doesn't matter either data is hardcoded or random: it's just irrelevant. And yet we have no robust way to detect such 'dead' tests.

So the question is:

Has anyone good advice how to write tests in the way such situations do not appear?

Nun answered 25/10, 2014 at 12:52 Comment(0)
D
7

"then requirements change and DoSomething changes its behavior"

Does it, really? If DoSomething changes behaviour, it violates the Open/Closed Principle (OCP). You may decide not to care about this, but it closely relates to why we trust tests.

Every time you change existing tests, you decrease their trustworthiness. Every time you change existing production behaviour, you'll need to review all tests that touch that production code. Ideally, you need to visit each such test, and briefly change the implementation to see that it still fails if the implementation is wrong.

For small changes, this may still be practical, but for even moderate changes, it'll be wiser to adhere to the OCP: don't change the existing behaviour; add the new behaviour side-by-side, and let the old behaviour atrophy.

In the above example, it may be clear that the AutoFixture test may be non-deterministically wrong, but on a more conceptual level, it's entirely possible that, if you change production behaviour without reviewing tests, some tests may silently turn into false negatives. This is a general problem related to unit testing, and not specific to AutoFixture.

Dunt answered 26/10, 2014 at 7:24 Comment(3)
Partially agree. Still hard to imagine how many compositions (or level of inheritance) system will have if requirements are naturally unstable (e.g legislation) but do not change radically each time and thus allow the reuse.Nun
@voroninp The thing about inherently unstable domains is that these are particularly the domains where the OCP is valuable, because OCP enables you to keep the old logic in place as well. This again enable you to recalculate how things were in the past. It also enables you to reinstate an old rule, in case anyone changes their minds.Dunt
Now I womder whether it is possible tu fully abstract away from dependency's behavior. Cause I've seen many tests where mocking were made with some behavior/service contract in mind. If the latter changes without signature modifications... should it be a completeley new contract (interface type)?Nun
I
7

The answer is actually hidden in this sentence:

[..] then requirements change and DoSomething changes its behavior [..]

Wouldn't it be easier if you do it this way:

  • First change the expectedOutput, to meet the new requirements.
  • Observe the failing test—it's important to see it failing.
  • Only then modify DoSomething according to new requirements—make the test passing again.

This methodology is not related to a particular tool like AutoFixture, it's only Test-Driven Development.


Where AutoFixture comes really useful then? Using AutoFixture, you can minimize the Arrange part of the test.

Here's the original test, written idiomatically using AutoFixture.Xunit:

[Theory, InlineAutoData]
public void DoSomethingWhenInputIsLowerThan10ReturnsCorrectResult(
    MyClass sut,
    [Range(int.MinValue, 9)]int input)
{
    Assert.True(input < 10);
    var expected = input + 1;

    var actual = sut.DoSomething(input);

    Assert.Equal(expected, actual);
}

[Theory, InlineAutoData]
public void DoSomethingWhenInputIsEqualsOrGreaterThan10ReturnsCorrectResult(
    MyClass sut,
    [Range(10, int.MaxValue)]int input)
{
    Assert.True(input >= 10);
    var expected = input * 10;

    var actual = sut.DoSomething(input);

    Assert.Equal(expected, actual);
}

Also, besides xUnit.net, there is also support for NUnit 2.

HTH

Infield answered 25/10, 2014 at 14:20 Comment(6)
Well, there is one problem here. When branching condition becomes quite complex you usually create several tests - each per condition branch. You can write tests for new branches but forget about the old methodsNun
Mostly I agree with your point, but I think it's better to express workflow expectations within test data generator. Intially it will return data by method 'GetSomeInput'. Later, when behavior changes, we remove that method and add two another 'GetLessThan10' and 'GetEqualOrGreaterThan10'. And removing first method will break initial test: it won't compile.Nun
@NikosBaxevanis that's what I'm talking about. You've updated your answer and added two tests. In ideal world we must delete the very first test, but this 'must' is not enforced by the system. By default there's no safety net here.Nun
It depends on whether your code is SOLID or not. If you're working on a SOLID code base (or at least a code base that tends to be SOLID) see this answer by @MarkSeemann. If not, most chances are that you'll end up modifying existing code, thus existing unit tests.Infield
@NikosBaxevanis BTW Has AutoFixture.NUnit2 package already implemented InlineAutoDataAttribute support? Does it work together with method's parameter attributes (Range, etc)?Maine
Data Annotations work also with AutoFixture.NUnit2. However, there's no InlineAutoData in AutoFixture.NUnit2 but you may use AutoData as well. Just make sure not to confuse System.ComponentModel.DataAnnotations.RangeAttribute with NUnit.Framework.RangeAttribute (use the ones from System.ComponentModel.DataAnnotations namespace).Infield
D
7

"then requirements change and DoSomething changes its behavior"

Does it, really? If DoSomething changes behaviour, it violates the Open/Closed Principle (OCP). You may decide not to care about this, but it closely relates to why we trust tests.

Every time you change existing tests, you decrease their trustworthiness. Every time you change existing production behaviour, you'll need to review all tests that touch that production code. Ideally, you need to visit each such test, and briefly change the implementation to see that it still fails if the implementation is wrong.

For small changes, this may still be practical, but for even moderate changes, it'll be wiser to adhere to the OCP: don't change the existing behaviour; add the new behaviour side-by-side, and let the old behaviour atrophy.

In the above example, it may be clear that the AutoFixture test may be non-deterministically wrong, but on a more conceptual level, it's entirely possible that, if you change production behaviour without reviewing tests, some tests may silently turn into false negatives. This is a general problem related to unit testing, and not specific to AutoFixture.

Dunt answered 26/10, 2014 at 7:24 Comment(3)
Partially agree. Still hard to imagine how many compositions (or level of inheritance) system will have if requirements are naturally unstable (e.g legislation) but do not change radically each time and thus allow the reuse.Nun
@voroninp The thing about inherently unstable domains is that these are particularly the domains where the OCP is valuable, because OCP enables you to keep the old logic in place as well. This again enable you to recalculate how things were in the past. It also enables you to reinstate an old rule, in case anyone changes their minds.Dunt
Now I womder whether it is possible tu fully abstract away from dependency's behavior. Cause I've seen many tests where mocking were made with some behavior/service contract in mind. If the latter changes without signature modifications... should it be a completeley new contract (interface type)?Nun

© 2022 - 2024 — McMap. All rights reserved.