Detecting incorrect assertion methods
Asked Answered
I

4

15

During one of the recent code reviews, I've stumbled upon the problem that was not immediately easy to spot - there was assertTrue() used instead of assertEqual() that basically resulted into a test that was testing nothing. Here is a simplified example:

from unittest import TestCase


class MyTestCase(TestCase):
    def test_two_things_equal(self):
        self.assertTrue("a", "b")

The problem here is that the test would pass; and technically, the code is valid, since assertTrue has this optional msg argument (that gets the "b" value in this case).

Can we do better than rely on the person reviewing the code to spot this kind of problems? Is there a way to auto-detect it using static code analysis with flake8 or pylint?

Intervention answered 11/4, 2017 at 16:47 Comment(3)
But flagging technically valid code might create a whole new different problem; information overflow.Euplastic
@Ev.Kounis right, there might be a lot of false positives if we approach it naively. I was thinking towards enforcing using msg as keyword argument only. In this case, we might catch this particular problem by warning that message was not properly passed to the assertion method. I don't like the idea very much, but hope to see if there are any other ideas..thanks.Intervention
I would highly recommend the answer provided by @Leon. By adding tests that you know should fail, you would identify a misused test case.Ander
T
6

Several years ago I came up with a general approach/methodology to assuring the quality of tests. The specification of a test can be reduced to two clauses:

  1. It must pass for correct implementation of the feature being tested, and
  2. It must fail for incorrect/broken implementation of the feature being tested

To the best of my knowledge, while the requirement 1. is routinely being exercised, little attention is being paid to requirement 2.

Typically

  • a test-suite is created,
  • the code is run against it,
  • any failures (because of bugs either in the code or in the tests) are fixed
  • and we arrive at a situation when we believe that our code and tests are good.

The actual situation may be that (some of) the tests contain bugs that (would) prevent them from catching bugs in the code. Therefore, seeing tests pass shouldn't suggest much tranquility to the person caring about the quality of the system, until they are confident that the tests are indeed able to detect the problems they were designed against1. And a simple way to do it is to actually introduce such problems and check that they don't remain unnoticed by the tests!

In TDD (test-driven development), this idea is followed only partially - the recommendation is to add the test before the code, see it fail (it should, since there is no code yet) and then fix it by writing the code. But failure of a test because of missing code doesn't automatically mean that it will also fail in case of buggy code (this seems to be true for your case)!

So the quality of a test suite can be measured as a percentage of bugs that it would be capable of detecting. Any reasonable2 bug that escapes a test-suite suggests a new test case covering that scenario (or, if the test suite should have caught that bug, a bug in the test suite is uncovered). This also means that every test of the suite must be able to catch at least one bug (otherwise, that test is completely pointless).

I was thinking about implementing a software system that facilitates adopting this methodology (i.e. allows injecting and maintaining artificial bugs in the code base and checks how the tests respond to them). This question acted as a trigger that I am going to start working on it right away. Hoping to put something together within a week. Stay tuned!

EDIT

A prototype version of the tool is now available at https://bitbucket.org/leon_manukyan/trit. I recommend cloning the repository and running the demo flow.


1 A more generalized version of this statement is true for a wider range of systems/situations (all typically having to do with security/safety):

A system designed against certain events must be routinely tested against such events, otherwise it is prone to degradation down to complete inability to react against the events of interest.

Just an example - do you have a fire alarm system at home? When did you witness it work last time? What if it stays silent during fire too? Go make some smoke in the room right now!

2 Within the scope of this methodology, a back-door like bug (e.g. when the feature misbehaves only if the passed in URL is equal to https://www.formatmyharddrive.com/?confirm=yesofcourse) is not a reasonable one

Texas answered 20/5, 2017 at 9:53 Comment(4)
You are absolutely right, great points. Strictly speaking, we should always see a test fail "intentionally" - making sure we are testing what we've intended. Thanks!Intervention
@Intervention The prototype version of the tool is ready. Find it at bitbucket.org/leon_manukyan/trit. I have provided a demo flow with a problem similar to the one in your question.Texas
trit is about something I've been thinking about for a long time, this is awesome! I'll definitely review and see if I can apply it to my daily workflows. Thanks so much!Intervention
@Intervention Thanks! I am committed to making trit a robust and convenient tool. I am going to actively improve/enhance it and will appreciate if you report any issues via bitbucket.org/leon_manukyan/trit/…Texas
C
10

Python now has a type hinting system that does static code analysis. Using this system you can require that the first argument of a function like assertTrue is always boolean. The problem is that assertTrue is not defined by you but by the unittest package. Unfortunately, the unittest package didn't add type hints. There's a reasonably simple way around that though: Just define your own wrapper.

from unittest import TestCase

class TestCaseWrapper(TestCase):
    def assertTrue(self, expr: bool, msg=None): #The ": bool" requires that the expr parameter is boolean.
        TestCase.assertTrue(self, expr, msg)

class MyTestCase(TestCaseWrapper):
    def test_two_things_equal(self):
        self.assertTrue("a", "b") #Would give a warning about the type of "a".

You can then run the type checker like so:

python -m mypy my_test_case.py

This should then give you a warning about how "a" is a string, not a boolean. The nice thing about this is that it can be run automatically in an automated test framework. Also, PyCharm will check the types in your code if you provide them and highlight anything that's wrong.

Coati answered 18/5, 2017 at 18:16 Comment(2)
When making wrappers like this, it's also useful to use functools.wraps to make the documentation and such inherit from the original function.Coati
Recommending that the first type be boolean is a terrible idea since you may just be checking that the value is truthy.Ander
T
6

Several years ago I came up with a general approach/methodology to assuring the quality of tests. The specification of a test can be reduced to two clauses:

  1. It must pass for correct implementation of the feature being tested, and
  2. It must fail for incorrect/broken implementation of the feature being tested

To the best of my knowledge, while the requirement 1. is routinely being exercised, little attention is being paid to requirement 2.

Typically

  • a test-suite is created,
  • the code is run against it,
  • any failures (because of bugs either in the code or in the tests) are fixed
  • and we arrive at a situation when we believe that our code and tests are good.

The actual situation may be that (some of) the tests contain bugs that (would) prevent them from catching bugs in the code. Therefore, seeing tests pass shouldn't suggest much tranquility to the person caring about the quality of the system, until they are confident that the tests are indeed able to detect the problems they were designed against1. And a simple way to do it is to actually introduce such problems and check that they don't remain unnoticed by the tests!

In TDD (test-driven development), this idea is followed only partially - the recommendation is to add the test before the code, see it fail (it should, since there is no code yet) and then fix it by writing the code. But failure of a test because of missing code doesn't automatically mean that it will also fail in case of buggy code (this seems to be true for your case)!

So the quality of a test suite can be measured as a percentage of bugs that it would be capable of detecting. Any reasonable2 bug that escapes a test-suite suggests a new test case covering that scenario (or, if the test suite should have caught that bug, a bug in the test suite is uncovered). This also means that every test of the suite must be able to catch at least one bug (otherwise, that test is completely pointless).

I was thinking about implementing a software system that facilitates adopting this methodology (i.e. allows injecting and maintaining artificial bugs in the code base and checks how the tests respond to them). This question acted as a trigger that I am going to start working on it right away. Hoping to put something together within a week. Stay tuned!

EDIT

A prototype version of the tool is now available at https://bitbucket.org/leon_manukyan/trit. I recommend cloning the repository and running the demo flow.


1 A more generalized version of this statement is true for a wider range of systems/situations (all typically having to do with security/safety):

A system designed against certain events must be routinely tested against such events, otherwise it is prone to degradation down to complete inability to react against the events of interest.

Just an example - do you have a fire alarm system at home? When did you witness it work last time? What if it stays silent during fire too? Go make some smoke in the room right now!

2 Within the scope of this methodology, a back-door like bug (e.g. when the feature misbehaves only if the passed in URL is equal to https://www.formatmyharddrive.com/?confirm=yesofcourse) is not a reasonable one

Texas answered 20/5, 2017 at 9:53 Comment(4)
You are absolutely right, great points. Strictly speaking, we should always see a test fail "intentionally" - making sure we are testing what we've intended. Thanks!Intervention
@Intervention The prototype version of the tool is ready. Find it at bitbucket.org/leon_manukyan/trit. I have provided a demo flow with a problem similar to the one in your question.Texas
trit is about something I've been thinking about for a long time, this is awesome! I'll definitely review and see if I can apply it to my daily workflows. Thanks so much!Intervention
@Intervention Thanks! I am committed to making trit a robust and convenient tool. I am going to actively improve/enhance it and will appreciate if you report any issues via bitbucket.org/leon_manukyan/trit/…Texas
E
4

One solution for this kind of problem is to use "mutation testing". This idea is to automatically generate "mutants" of your code, by introducing small changes in it. Then your test suite is run against these mutants and if it is good most of them should be killed, meaning that your test suite detects the mutation and the tests fail.

Mutation testing actually evaluates the quality of your tests. In your example, no mutants would be killed and you would easily detect that there's something wrong with the test.

In python, there are several mutation frameworks available:

Extinction answered 23/5, 2017 at 8:49 Comment(0)
H
2

A quick solution would be to provide a Mixin that checks the correctness:

import unittest


class Mixin(object):
    def assertTrue(self, *args, **kwargs):
        if len(args) > 1:
            # TypeError is just an example, it could also do some warning/logging
            # stuff in here.
            raise TypeError('msg should be given as keyword parameter.')
        super().assertTrue(*args, **kwargs)


class TestMixin(Mixin, unittest.TestCase):  # Mixin before other parent classes
    def test_two_things_equal(self):
        self.assertTrue("a", "b")

The Mixin could also check if the passed expression is a boolean:

class Mixin(object):
    def assertTrue(self, *args, **kwargs):
        if type(args[0]) is bool:
            raise TypeError('expression should be a boolean')
        if len(args) > 1:
            raise TypeError('msg should be given as keyword parameter.')
        super().assertTrue(*args, **kwargs)

However this isn't static and it requires manually altering your test classes (adding the Mixin) and running the tests. Also it will throw a lot of false-positives because passing the message as keyword-argument isn't really common (at least not where I've seen it) and in a lot of cases you want to check the implicit truthiness of the expression instead of the explicit bool. Like to check for not-emptiness if a when a is a list, dict, etc.

You could also use some setUp, teardown code that alters the assertTrue method for the particular class:

import unittest


def decorator(func):
    def wrapper(*args, **kwargs):
        if len(args) > 1:
            raise TypeError()
        return func(*args, **kwargs)
    return wrapper


class TestMixin(unittest.TestCase):
    def setUp(self):
        self._old = self.assertTrue
        self.assertTrue = decorator(self.assertTrue)

    def tearDown(self):
        self.assertTrue = self._old

    def test_two_things_equal(self):
        self.assertTrue("a", "b")

But a word of caution before you apply any of these approaches: Always be careful before you alter existing tests. Unfortunatly tests are sometimes poorly documentated, so it's not always obvious what they test for and how they test for it. Sometime a test makes no sense and it's safe to alter it, but sometimes it tests a particular feature in a weird way and when you change it you change what is being tested. So at the very least make sure there's no coverage change when you change the test case. If necessary make sure you clarify the purpose of the test by updating the method name, method documentation or in-line comments.

Homes answered 20/5, 2017 at 12:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.