Testing a too general program

Asked 1/7, 2021 at 19:11 Answered 26/7, 2021 at 18:6

Suppose the correct definition of a predicate would be

len([],0).
len([_|T],N)  :-  len(T,X), N is  X+1.

However, we end up with the following erroneous definition in stead.

len2([],0).
len2([_|T],N)  :-  len(T,X),  ( N  is  X+1 ; N is X + 2, N = 10000 ).

All the standard testing doesn’t reveal the mistake because it works just like len/2 except when it stumbles upon a list of length exactly 9999 elements where there are two possible answers.

as user mjano314 observes. How is it possible to detect such an error?

Note that len2/2 above uses len/2. In this manner there is precisely one single case where the definition is too general. Would len2/2 be directly recursive, we would have infinitely many cases that are too general. Obviously, in such a situation it would be easier to locate errors.

Febrifacient answered 1/7, 2021 at 19:11 Comment(20)

This is a like a hacked program which does nothing unexpected except when when it receives a certain special input, upon which it does the unexpected, like letting the user access all areas. Standard testing is not going to reveal anything (well, in this case it still might, but suppose you have to find the soft bomb in a large program), but program analysis might (in this case, we would demand just one solution in all cases). – Servo 1/7, 2021 at 19:57

@DavidTonhofer: In Prolog you can do better. – Febrifacient 1/7, 2021 at 19:59

Suspicion might also be raised by finding program code that is never exercised in coverage analysis while we would expect 100% of the code of such a function to be used in all cases (i.e. dead code detection alerts us) – Servo 1/7, 2021 at 19:59

@GuyCoder: Cheaper! – Febrifacient 1/7, 2021 at 20:58

Cheaper than Symbolic Execution, cheaper than constraints and CSP, or even Abstract Interpretation (not sure if this is nowadays subsumed by Symbolic Execution). – Febrifacient 1/7, 2021 at 21:2

Is the problem "whitebox" i.e. can we inspect the source code? Do we have a known correct implementation (one following the requrement that it always yields the correct length of the list, that is) to compare to? – Servo 2/7, 2021 at 6:21

@DavidTonhofer: Any whitebox approach would be quite expensive. And no reference implementation - we are just implementing this relation for the first time! – Febrifacient 2/7, 2021 at 6:46

In this particular case, len2(L,N), \+ len(L,N) will find the extra answer but the program would run forever if len2/2 was correct. – Tadashi 2/7, 2021 at 6:48

@jnmonette: You are using len/2 as a reference implementation. That does not count. We are seeing len2/2 alone. – Febrifacient 2/7, 2021 at 7:33

@Febrifacient You wrote in a comment above: "We are seeing len2/2 alone." You wrote in the question: "However, we end up with the following erroneous definition in stead." Try to be more clear in what you're asking. – Fascination 2/7, 2021 at 15:13

@PauloMoura: in case of doubt, please refer to the original text cited above. – Febrifacient 2/7, 2021 at 15:30

Assuming that we want to show that the predicate does not define a function (from its first to second argument), we could use len2(L,N1), len2(L,N2), N1\=N2. But again this only works because we can generate a faulty input in finite time... – Tadashi 2/7, 2021 at 17:33

If this procedure is intended to be using mode(+,-) then, at least with SWI, using directive det may be useful – Phytology 2/7, 2021 at 18:0

@jnmonette: this only works because we can generate a faulty input in finite time right! Isn't any computation only working because we can generate a result in finite time? – Febrifacient 2/7, 2021 at 19:16

What I meant by this is that we cannot write this as part of some automated testing/validation of our program and expect it to simply fail (which would mean that the test passes) when the program is correct, because it would never finish. So we can only use it when we already know that there is that type of fault. (And in this case, it is simply easier to look at the program code to find the fault.) – Tadashi 4/7, 2021 at 8:30

@Febrifacient "Isn't any computation only working because we can generate a result in finite time?" Sure but sometimes we have to go meta, as in proving in finite time that there is no result in the first place. – Servo 4/7, 2021 at 18:15

as asked this question is not clear ("too general" is not defined) and self-contradictory: you said "instead": calling len2([1,2,3],X) produces the run-time error Undefined procedure: len/2, not the "two possible answers" semantic "error". – Frasch 10/7, 2021 at 14:34

@WillNess: Please do read the added paragraph! len/2 is defined above. And sure you know what a relation that is too general is. – Febrifacient 10/7, 2021 at 14:38

@Febrifacient but you said instead. and actually, I don't. even if I knew, some other readers might not. goodling does not help with this. some general hint would go a long way. :) – Frasch 10/7, 2021 at 14:40

@Will, right, we have now len2/2 as the defined predicate instead of the intended len/2. Nevertheless, the implementation of len2/2 uses internally len/2. – Febrifacient 10/7, 2021 at 14:43

If we already suspect that the predicate len2(X,Y) is not functional while we expect it to be, meaning in this case that there are no two answers with the same value for the first argument and different values for the second argument, then we can verify our suspicion by searching for such two answers with the following snippet:

len2(X,Y1), len2(X,Y2), Y1\=Y2

In this case, the program will give us an answer with Y1=9999, Y2=10000 and X a list of 9999 variables.

However, if the fault is not present or if the code of the predicate was such that the input triggering the fault was not generated in finite time (imagine that it generates all even-length lists before any odd-length list), then the code above will not finish. This means, in my opinion, that this approach is only useful as a debugging tool but not really suitable as part of some automated testing/validation of the predicate.

Tadashi answered 4/7, 2021 at 8:50 Comment(2)

For your point about its effectiveness it suffices to exchange the order of the clauses in len/2. – Febrifacient 4/7, 2021 at 15:26

There is at least one other similarly meaningful and effective query. – Febrifacient 4/7, 2021 at 15:27

As has been noted by @jnmonette, there is a functional dependency from the first to the second argument. With a query like

?- len2(L, N), dif(N, M), len2(L, M).
   L = [_A,_B,_C,_D,_E,_F,_G,_H,_I,_J|...], N = 9999, M = 10000
;  L = [_A,_B,_C,_D,_E,_F,_G,_H,_I,_J|...], N = 10000, M = 9999
;  loops.

the error in len2/2 can be detected. After all, L cannot have two different lengths. Additionally,

?- len2([_|L], N), len2(L, N).
   N = 10000, L = [_A,_B,_C,_D,_E,_F,_G,_H,_I,_J|...]
;  loops.

identifies the error going into the other direction. The lengths of L and [_|L] cannot be the same. This can be generalized to cover all such errors:

?- len2(L, N), phrase(([_],...), L,K), len2(K, N).
   L = [_A,_B,_C,_D,_E,_F,_G,_H,_I,_J|...], N = 10000, K = [_B,_C,_D,_E,_F,_G,_H,_I,_J,_K|...]
;  loops.

So far we have used Prolog directly. We were able to state general properties by posing this queries. However, we will only find counterexamples should there be some, but otherwise we are left in the uncertain. And, we were pretty lucky that the actual definition enumerates answers in a fair manner letting each and every answer appear in finite time. In case of a more demanding definition (think of exchanging the order of clauses in len/2) simply add length(L, _) in front of all queries so far.

However, in case of a running query we inevitably ask: Should we continue to wait for an answer, or can we already abort the query? After all, for a correct implementation the query will not produce any answer and thus loop indefinitely.

There is no way (in current implementations) to delegate such a query at least into the background running there with lower priority. And therefore such queries are not used for testing at all.

On the other hand, such queries are a very powerful way to express many testable properties. For example many of the bugs in clpfd-systems of SICStus, SWI, and Scryer have been identified in this manner using condor. The cruder support however does not lead to very elegant solutions.

To start to address this problem, the following annotation may help:

:/-& len2(L, N), dif(N, M), len2(L, M).
:/-& len2(L, N), phrase(([_],...), L,K), len2(K, N).

The :/- means there is no solution - similarly to :- \+ Q_0. and the additional &, an asciified ∞, meaning Prolog's execution will be infinite. This annotation therefore leaves room to try out better strategies that disprove that there is no solution.

In GUPU this annotation is executed as a Prolog goal with a (relatively short) timeout. Also alternate strategies are tried in particular iterative deepening which in this case do time out as well. So effectively, the error remains undetected. But with more resources or a better strategy the error might be discovered.

Febrifacient answered 26/7, 2021 at 18:6 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags