How to deal with underflow in scientific computing?

Asked 17/2, 2012 at 23:6 Answered 19/9, 2014 at 21:55

Solved java math floating-point scientific-computing

I am working on probabilistic models, and when doing inference on those models, the estimated probabilities can become very small. In order to avoid underflow, I am currently working in the log domain (I store the log of the probabilities). Multiplying probabilities is equivalent to an addition, and summing is done by using the formula:

log(exp(a) + exp(b)) = log(exp(a - m) + exp(b - m)) + m

where m = max(a, b).

I use some very large matrices, and I have to take the element-wise exponential of those matrices to compute matrix-vector multiplications. This step is quite expensive, and I was wondering if there exist other methods to deal with underflow, when working with probabilities.

Edit: for efficiency reasons, I am looking for a solution using primitive types and not objects storing arbitrary-precision representation of real numbers.

Edit 2: I am looking for a faster solution than the log domain trick, not a more accurate solution. I am happy with the accuracy I currently get, but I need a faster method. Particularly, summations happen during matrix-vector multiplications, and I would like to be able to use efficient BLAS methods.

Solution: after a discussion with Jonathan Dursi, I decided to factorize each matrix and vector by its largest element, and to store that factor in the log domain. Multiplications are straightforward. Before additions, I have to factorize one of the added matrices/vectors by the ratio of the two factors. I update the factor every ten operations.

Scutch answered 17/2, 2012 at 23:6 Comment(8)

Do you must use Java? Or you can use other languages? – Limburg 17/2, 2012 at 23:31

@Peter - this is not at all uncommon. Working with maximum likelihood estimation for example, it will not be at all uncommon to see numbers like this. Your optimizer must still be able to converge even though the starting point is not as good as you like. And if you get underflows there, then convergence is not an option. – Gloxinia 17/2, 2012 at 23:43

It does sound like the problem is pretty abstract. If you measure the age of the universe in plank units, you get about 2e58, the number of units of time anything could have happened. If something has a probability of less than 1e-300 its hard to imagine that's not pretty close to impossible or at least theoretically unmeasurable and unknowable. Just think about more many measurements you would need to take for you to know something has a probability of 1e-58. – Jaella 18/2, 2012 at 7:50

@Peter - Suppose you are modeling a particle moving along a line which have the following behavior: at each time step, it can either move one step forward, with probability 0.5, or one step backward with probability 0.5. One sequence of positions of length 1000 have a probability of 0.5^1000. With one measurement, I have an observed sequence, which have a very low probability. – Scutch 18/2, 2012 at 11:53

You have 2^1000 outcomes of each equally likely, and no computer can support or consider this many combinations. Or if you only care about the result you have a plain probability distribution where all the likely events (likely to ever occur in the life of the universe) can be represented. Perhaps there is another way to model your situation in which you are considering events with more modest scales. – Jaella 18/2, 2012 at 12:1

@Peter - Ok, now suppose I do not know the probability of moving backward and forward, and want to infer them. I observe the path of one particle, and I can evaluate those probabilities. But for doing this, I have to compute the probability of the corresponding path. And of course, I am not considering all the outcomes, but the one I am observing still have a very low probability, while my model is very simple and tractable (only one parameter!). – Scutch 18/2, 2012 at 12:27

@PeterLawrey : I think the issue here is not final answers (I think we all agree here that if the final answer were 1e-300, then printing out 0. would be a perfectly good approximation to that answer) but intermediate results. If you have to take the ratio of probability of two very rare events (eg), those numbers have to be calculated correctly or the error in the result will be completely wrong, even if the absolute error in either probability were extremely small. – Dysthymia 18/2, 2012 at 15:28

Just a note: either a-m or b-m are going to be 0, so you can just put 1 instead of the max. – Seepage 6/4, 2012 at 22:28

This issue has come up recently on the computational science stack exchange site as well, and although there the immediate worry there was overflow, the issues are more or less the same.

Transforming into log space is certainly one reasonable approach. Whatever space you're in, to do a large number of sums correctly, there's a couple of methods you can use to improve the accuracy of your summations. Compensated summation approaches, most famously Kahan summation, keep both a sum and what's effectively a "remainder"; it gives you some of the advantages of using higher precision arithmeitic without all of the cost (and only using primitive types). The remainder term also gives you some indication of how well you're doing.

In addition to improving the actual mechanics of your addition, changing the order of how you add your terms can make a big difference. Sorting your terms so that you're summing from smallest to largest can help, as then you're no longer adding terms as frequently that are very different (which can cause significant roundoff problems); in some cases, doing log₂ N repeated pairwise sums can also be an improvement over just doing the straight linear sum, depending on what your terms look like.

The usefullness of all these approaches depend a lot on the properties of your data. The arbitrary precision math libraries, while enormously expensive in compute time (and possibly memory) to use, have the advantage of being a fairly general solution.

Dysthymia answered 18/2, 2012 at 15:38 Comment(7)

Thanks for that very interesting answer. However, I am looking for a more efficient method, not a more accurate one (I am happy with the accuracy I get with the log domain trick). And using compensated summation without working in the log space only fix the accuracy problem, not the risk of underflows. – Scutch 20/2, 2012 at 12:47

You're not interested in accuracy, but you are worried about underflowing? Underflowing isn't an accuracy consideration? I don't think I understand what you're looking for. – Dysthymia 20/2, 2012 at 16:0

What I meant by 'accuracy' is accuracy of summations. Using compensated summation, I can still obtain numbers which are too small to be represented by a double, when multiplying two small numbers. When doing inference on long HMMs, you can obtain intermediate quantities which are smaller than 10^-324, but have the same order of magnitude. Factorizing by the max allows you to compute an accurate sum. This is what my current solution is doing. Basically, I am looking for a representation of small numbers, with efficient addition and multiplication. Now I only have efficient multiplication. – Scutch 20/2, 2012 at 17:30

What is the desired range of exponents, and digits precision of results? – Dysthymia 20/2, 2012 at 17:38

Range of exponents is [-700; 0] and I would be happy with a precision of 5 or 6 digits. – Scutch 20/2, 2012 at 17:48

Why don't you just scale everything by ~1e300 (or so); that gives you an exponent range of ~616 and significanltly in excess of 6 digits? Otherwise, you can just use quad precision; in Fortran that would be trivial, in Java it's straightforward to implement it in software with two double precision numbers per value; just take a look at ddfun90 (crd-legacy.lbl.gov/~dhbailey/mpdist) or dsmath.c in this forum (forums.nvidia.com/index.php?showtopic=73067). You wouldn't need to change your matrix or vector, just do the matrix-vector products in this software-extended precision. – Dysthymia 20/2, 2012 at 17:57

I'll try to do a mix of this and my old solution. Each vector or matrix will be factorized by its largest element, stored in the log domain. Multiplication and addition are straightforward. I just need to update the factor term every ten or twenty operations. Thanks for the help. – Scutch 20/2, 2012 at 18:41

Option 1: Commons Math - The Apache Commons Mathematics Library

Commons Math is a library of lightweight, self-contained mathematics and statistics components addressing the most common problems not available in the Java programming language or Commons Lang.

Note: The API protects the constructors to force a factory pattern while naming the factory DfpField (rather than the somewhat more intuitive DfpFac or DfpFactory). So you have to use

new DfpField(numberOfDigits).newDfp(myNormalNumber)

to instantiate a Dfp, then you can call .multiply or whatever on this. I thought I'd mention this because it's a bit confusing.

Option 2: GNU Scientific Library or Boost C++ Libraries. In these cases you should use JNI in order to call these native libraries.

Option 3: If you are free to use other programs and/or languages, you could consider using programs/languages for numerical computations such as Octave, Scilab, and similar.

Option 4: BigDecimal of Java.

Limburg answered 17/2, 2012 at 23:35 Comment(9)

At least Matlab and Octave both have some Java Bindings as well. – Soyuz 17/2, 2012 at 23:56

Octave is a lot cheaper (free!) than Matlab. – Freon 18/2, 2012 at 0:1

Thanks for the references, but I do not think they will work for me. Option 1 & 4: using arbitrary precision decimal numbers is too expensive because they use objects and not primitive types, and because computing additions and multiplications with such representation is more expensive. Option 2: Same problems as 1 & 4 (AFAIK) and I prefer to stick with java. Option 3: I have been using numpy and matlab for some time, and the same problem happens, because they also use floats and doubles. – Scutch 18/2, 2012 at 14:40

@Edouard: But java from this point of view is the least appropriate for the simulations, because it is a "semi-compiled" (that is, "semi-interpreted") language, so you would have performance issues. Instead, Octave, Scilab and similar have their own routines optimized for operations involving matrices and vectors, in fact they are often used for the simulations. However I remember that in Matlab you can set the precision: look at this link. – Limburg 18/2, 2012 at 14:52

@Limburg - I have been using those for three years (mostly scilab and numpy), and when doing inference on small Hidden Markov Models, I used the log domain trick, which is well known and used in the field I am working in. But even for those languages, the exponential step is the bottleneck. – Scutch 18/2, 2012 at 15:6

@Limburg - I am using bindings to BLAS to do matrix computations, so performance on matrix/vector operations is not an issue here. And java is not that bad compared to C++ or even FORTRAN: see language shootout – Scutch 18/2, 2012 at 15:15

Obviously, the performance also depends on a calculation algorithm used: a java library which uses algorithms optimized for certain types of calculations can certainly be better than one in C/C++ unoptimized. – Limburg 18/2, 2012 at 15:26

This is exactly the point made by the link I sent. The algorithm is the same, implemented in various languages. So this benchmark is really measuring efficiency of languages. The fact that java is slow compared to C/C++ is a myth: other benchmarks – Scutch 18/2, 2012 at 15:56

Good, so I learned something new: I do not know this was a myth. Thanks for the link! – Limburg 18/2, 2012 at 17:7

I ran into a similar problem years ago. The solution was to develop an approximation of log(1+exp(-x)). The range of the approximation does not need to be all that large (x from 0 to 40 will more than suffice), and at least in my case the accuracy didn't need to be particularly high, either.

In your case, it looks like you need to compute log(1+exp(-x1)+exp(-x2)+...). Throw out those large negative values. For example, suppose a, b, and c are three log probabilities, with 0>a>b>c. You can ignore c if a-c>38. It's not going to contribute to your joint log probability at all, at least not if you are working with doubles.

Magnusson answered 18/2, 2012 at 16:52 Comment(2)

Clever trick. But I think that developing an approximation of log(1 + exp(x1) + exp(x2) + ...) which is faster than taking the exp function of n doubles is quite challenging. – Scutch 20/2, 2012 at 12:53

You can still use the trick of excluding those extremely low probability events. If you are working with IEEE doubles, 1+exp(-37) is exactly equal to 1. This will immediately get rid of your underflow problem. – Magnusson 20/2, 2012 at 18:25

Rather than storing values in logarithmic form, I think you'd probably be better off using the same concept as doubles, namely, floating-point representation. For example, you might store each value as two longs, one for sign-and-mantissa and one for the exponent. (Real floating-point has a carefully tuned design to support lots of edge cases and avoid wasting a single bit; but you probably don't need to worry so much about any of those, and can focus on designing it in a way that's simple to implement.)

Pantomimist answered 18/2, 2012 at 0:39 Comment(4)

The OP is working on probabilistic models. Log probabilities are a very common in such problems. – Magnusson 18/2, 2012 at 16:39

I thought about that. But as I said in my edited question, I prefer to stick to primitive types (doubles) for efficiency reasons, instead of developing a new type more suited to my needs, but leading to performance issues. – Scutch 20/2, 2012 at 12:56

@Edouard: I don't know. It seems odd to me that using two longs and performing ordinary integer arithmetic would perform worse than using one double and performing logarithms and exponentiation, but I'll take your word for it. – Pantomimist 20/2, 2012 at 13:46

I was not clear. What I meant is that using a custom type would force me to write linear algebra functions for matrix/vector computations using this type. But competing with BLAS or other optimized linear algebra packages working on doubles is kind of hard. – Scutch 20/2, 2012 at 14:41

I don't understand why this works, but this formula seems to work and is simpler:

c = a + log(1 + exp(b - a))

Where c = log(exp(a)+exp(b))

Littoral answered 19/9, 2014 at 21:55 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags