For those who are not sure what is meant by 'constrained non-determinism' I recommend Mark Seeman's post.
The essence of the idea is the test having deterministic values only for data affecting SUT behavior. Not 'relevant' data can be to some extent 'random'.
I like this approach. The more data is abstract the more clear and expressive expectations become and indeed it becomes harder to unconsciously fit data to the test.
I'm trying to 'sell' this approach (along with AutoFixture) to my colleagues and yesterday we had a long debate about it.
They proposed interesting argument about not stable hard to debug tests due to random data.
At first that seemed bit strange because as we all agreed that flow affecting data mustn't be random and such behavior is not possible. Nonetheless I took a break to thoroughly think over that concern.
And I finally came to the following problem:
But some of my assumptions first:
- Test code MUST be treated as production code.
- Test code MUST must express correct expectations and specifications of system behavior.
- Nothing warns you about inconsistencies better than broken build (either not compiled or just failed tests - gated check-in).
consider these two variants of the same test:
[TestMethod]
public void DoSomethig_RetunrsValueIncreasedByTen()
{
// Arrange
ver input = 1;
ver expectedOutput = input+10;
var sut = new MyClass();
// Act
var actualOuptut = sut.DoeSomething(input);
// Assert
Assert.AreEqual(expectedOutput,actualOutput,"Unexpected return value.");
}
/// Here nothing is changed besides input now is random.
[TestMethod]
public void DoSomethig_RetunrsValueIncreasedByTen()
{
// Arrange
var fixture = new Fixture();
ver input = fixture.Create<int>();
ver expectedOutput = input+10;
var sut = new MyClass();
// Act
var actualOuptut = sut.DoeSomething(input);
// Assert
Assert.AreEqual(expectedOutput,actualOutput,"Unexpected return value.");
}
So far so god, everything works and life is beautiful but then requirements change and DoSomething
changes its behavior: now it increases input only if it'is lower than 10, and multiplies by 10 otherwise.
What happens here? The test with hardcoded data passes (actually accidentally), whereas the second test fails sometimes. And they both are wrong deceiving tests: they check nonexistent behavior.
Looks like it doesn't matter either data is hardcoded or random: it's just irrelevant. And yet we have no robust way to detect such 'dead' tests.
So the question is:
Has anyone good advice how to write tests in the way such situations do not appear?