How to test scientific software?
Asked Answered
B

4

40

I'm convinced that software testing indeed is very important, especially in science. However, over the last 6 years, I never have come across any scientific software project which was under regular tests (and most of them were not even version controlled).

Now I'm wondering how you deal with software tests for scientific codes (numerical computations).

From my point of view, standard unit tests often miss the point, since there is no exact result, so using assert(a == b) might prove a bit difficult due to "normal" numerical errors.

So I'm looking forward to reading your thoughts about this.

Bezanson answered 6/8, 2010 at 6:30 Comment(2)
I asked a similar question over on the Computational Science beta.Laylalayman
See also scicomp.stackexchange.com/questions/206/…Ommatidium
I
14

I am also in academia and I have written quantum mechanical simulation programs to be executed on our cluster. I made the same observation regarding testing or even version control. I was even worse: in my case I am using a C++ library for my simulations and the code I got from others was pure spaghetti code, no inheritance, not even functions.

I rewrote it and I also implemented some unit testing. You are correct that you have to deal with the numerical precision, which can be different depending on the architecture you are running on. Nevertheless, unit testing is possible, as long as you are taking these numerical rounding errors into account. Your result should not depend on the rounding of the numerical values, otherwise you would have a different problem with the robustness of your algorithm.

So, to conclude, I use unit testing for my scientific programs, and it really makes one more confident about the results, especially with regards to publishing the data in the end.

Insnare answered 6/8, 2010 at 6:53 Comment(2)
Can you recommend any test frameworks for this? I'm also using c++.Bezanson
Take a look at cpptest cpptest.sourceforge.net in particular the TEST_ASSERT_DELTA(a,b,delta) assertion, with which you can compare two values a and b within a precision delta.Insnare
O
12

Just been looking at a similar issue (google: "testing scientific software") and came up with a few papers that may be of interest. These cover both the mundane coding errors and the bigger issues of knowing if the result is even right (depth of the Earth's mantle?)

http://http.icsi.berkeley.edu/ftp/pub/speech/papers/wikipapers/cox_harris_testing_numerical_software.pdf

http://www.cs.ua.edu/~SECSE09/Presentations/09_Hook.pdf (broken link; new link is http://www.se4science.org/workshops/secse09/Presentations/09_Hook.pdf)

http://www.associationforsoftwaretesting.org/?dl_name=DianeKellyRebeccaSanders_TheChallengeOfTestingScientificSoftware_paper.pdf

I thought the idea of mutation testing described in 09_Hook.pdf (see also matmute.sourceforge.net) is particularly interesting as it mimics the simple mistakes we all make. The hardest part is to learn to use statistical analysis for confidence levels, rather than single pass code reviews (man or machine).

The problem is not new. I'm sure I have an original copy of "How accurate is scientific software?" by Hatton et al Oct 1994, that even then showed how different implementations of the same theories (as algorithms) diverged rather rapidly (It's also ref 8 in Kelly & Sanders paper)

--- (Oct 2019) More recently Testing Scientific Software: A Systematic Literature Review

Oren answered 23/4, 2012 at 10:32 Comment(3)
@Dmitry Kabanov, thanks for updating the link. As a side point there are now also a number of automated tested for detecting security issues, such as the AFL (American Fuzzy Lop) and other Futzers github.com/google/syzkaller and lwn.net/Articles/677764 (Coverage-guided kernel fuzzing with syzkaller) which help drive out the simple unchecked data mistakes. It's still very hard to see logical problems though e.g. all data placed onto a predefined grid resulting in answers neatly fixed to that grid.Oren
The general problem of finding wrong results (instead of just crashes) is called "property-based testing" - see hypothesis.works/articles/what-is-property-based-testing and to apply it to scientific code see conference.scipy.org/proceedings/scipy2020/pdfs/…Exteriorize
@ZacHatfield-Dodds Thanks. Those are useful references. I've yet to read them but I suspect the problem (or distinction) is one of scale, like science itself, where theories are not invalidated by a few bad results, nor always accepted when inconveniently true. Often we don't have a realistic reference test (see the Oil Well seismic survey example).Oren
H
8

I'm also using cpptest for its TEST_ASSERT_DELTA. I'm writing high-performance numerical programs in computational electromagnetics and I've been happily using it in my C++ programs.

I typically go about testing scientific code the same way as I do with any other kind of code, with only a few retouches, namely:

  • I always test my numerical codes for cases that make no physical sense and make sure the computation actually stops before producing a result. I learned this the hard way: I had a function that was computing some frequency responses, then supplied a matrix built with them to another function as arguments which eventually gave its answer a single vector. The matrix could have been any size depending on how many terminals the signal was applied to, but my function was not checking if the matrix size was consistent with the number of terminals (2 terminals should have meant a 2 x 2 x n matrix); however, the code itself was wrapped so as not to depend on that, it didn't care what size the matrices were since it just had to do some basic matrix operations on them. Eventually, the results were perfectly plausible, well within the expected range and, in fact, partially correct -- only half of the solution vector was garbled. It took me a while to figure. If your data looks correct, it's assembled in a valid data structure and the numerical values are good (e.g. no NaNs or negative number of particles) but it doesn't make physical sense, the function has to fail gracefully.

  • I always test the I/O routines even if they are just reading a bunch of comma-separated numbers from a test file. When you're writing code that does twisted math, it's always tempting to jump into debugging the part of the code that is so math-heavy that you need a caffeine jolt just to understand the symbols. Days later, you realize you are also adding the ASCII value of \n to your list of points.

  • When testing for a mathematical relation, I always test it "by the book", and I also learned this by example. I've seen code that was supposed to compare two vectors but only checked for equality of elements and did not check for equality of length.

Hailstone answered 8/8, 2010 at 1:9 Comment(1)
the i/o Part is very true. often I wrote a perl script that should parse some text file and it did not work because I missed a detail in the source file.Insnare
C
2

Please take a look at the answers to the SO question How to use TDD correctly to implement a numerical method?

Constitution answered 6/8, 2010 at 6:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.