Avoiding denormal values in C++
Asked Answered
L

6

27

After searching a long time for a performance bug, I read about denormal floating point values.

Apparently denormalized floating-point values can be a major performance concern as is illustrated in this question: Why does changing 0.1f to 0 slow down performance by 10x?

I have an Intel Core 2 Duo and I am compiling with gcc, using -O2.

So what do I do? Can I somehow instruct g++ to avoid denormal values? If not, can I somehow test if a float is denormal?

Leatherneck answered 21/3, 2010 at 15:51 Comment(2)
Do you actually have any denormal values in your program? Anyway, you can disable them at the CPU level as @nobugz said. If you set the right flag, the CPU will truncate all denormals to zero, eliminating the problem.Seraphim
See this question for an in-depth discussion of denormals and dealing with them: #9315034Dogmatic
D
20

You can test whether a float is denormal using

#include <cmath>

if ( std::fpclassify( flt ) == FP_SUBNORMAL )

(Caveat: I'm not sure that this will execute at full speed in practice.)

In C++03, and this code has worked for me in practice,

#include <cmath>
#include <limits>

if ( flt != 0 && std::fabsf( flt ) < std::numeric_limits<float>::min() ) {
    // it's denormalized
}

To decide where to apply this, you may use a sample-based analyzer like Shark, VTune, or Zoom, to highlight the instructions slowed by denormal values. Micro-optimization, even more than other optimizations, is totally hopeless without analysis both before and after.

Delagarza answered 21/3, 2010 at 16:11 Comment(5)
What do you mean, no performance penalty? Are you sure? I wrote a little test program, showing that adding floating points with a value of exp(-100) is 10 times slower than when the value is 0.1. Am I completely wrong here?Leatherneck
@Nathan: only in that the penalty is actually substantially more than 10x =)Reformer
Using std::isnormal doesn't do what you want in most cases because isnormal returns 0 when its argument is zero or infinity or NaN (when looking for denormal stalls, you usually only are looking for denormals, not those other values which are handled at speed on modern hardware).Reformer
@StephenCanon Oops, zero is the real deal-breaker. Thanks.Delagarza
This answer will detect denormals, but is not a good solution to avoid the performance penalty. Better is to set control flags for the floating point unit to flush denormals like mentioned by others.Teasley
R
24

Wait. Before you do anything, do you actually know that your code is encountering denormal values, and that they're having a measurable performance impact?

Assuming you know that, do you know if the algorithm(s) that you're using is stable if denormal support is turned off? Getting the wrong answer 10x faster is not usually a good performance optimization.

Those issues aside:

  • If you want to detect denormal values to confirm that their presence, you have a few options. If you have a C99 standard library or Boost, you can use the fpclassify macro. Alternatively, you can compare the absolute values of your data to the smallest positive normal number.

  • You can set the hardware to flush denormal values to zero (FTZ), or treat denormal inputs as zero (DAZ). The easiest way, if it is properly supported on your platform, is probably to use the fesetenv( ) function in the C header fenv.h. However, this is one of the least-widely supported features of the C standard, and is inherently platform specific anyway. You may want to just use some inline assembly to directly set the FPU state to (DAZ/FTZ).

Reformer answered 21/3, 2010 at 16:10 Comment(0)
D
20

You can test whether a float is denormal using

#include <cmath>

if ( std::fpclassify( flt ) == FP_SUBNORMAL )

(Caveat: I'm not sure that this will execute at full speed in practice.)

In C++03, and this code has worked for me in practice,

#include <cmath>
#include <limits>

if ( flt != 0 && std::fabsf( flt ) < std::numeric_limits<float>::min() ) {
    // it's denormalized
}

To decide where to apply this, you may use a sample-based analyzer like Shark, VTune, or Zoom, to highlight the instructions slowed by denormal values. Micro-optimization, even more than other optimizations, is totally hopeless without analysis both before and after.

Delagarza answered 21/3, 2010 at 16:11 Comment(5)
What do you mean, no performance penalty? Are you sure? I wrote a little test program, showing that adding floating points with a value of exp(-100) is 10 times slower than when the value is 0.1. Am I completely wrong here?Leatherneck
@Nathan: only in that the penalty is actually substantially more than 10x =)Reformer
Using std::isnormal doesn't do what you want in most cases because isnormal returns 0 when its argument is zero or infinity or NaN (when looking for denormal stalls, you usually only are looking for denormals, not those other values which are handled at speed on modern hardware).Reformer
@StephenCanon Oops, zero is the real deal-breaker. Thanks.Delagarza
This answer will detect denormals, but is not a good solution to avoid the performance penalty. Better is to set control flags for the floating point unit to flush denormals like mentioned by others.Teasley
K
8

Most math coprocessors have an option to truncate denormal values to zero. On x86 it is the FZ (Flush to Zero) flag in the MXCSR control register. Check your CRT implementation for a support function to set the control register. It ought to be in <float.h>, something resembling _controlfp(). The option bit usually has "FLUSH" in the #defined symbol.

Double-check your math results after you set this. Which is something you ought to do anyway, getting denormals is a sign of health problems.

Kaon answered 21/3, 2010 at 16:23 Comment(0)
C
5

To have (flush-to-zero) FTZ (assuming underflow is masked by default) in gcc:

#define CSR_FLUSH_TO_ZERO         (1 << 15)
unsigned csr = __builtin_ia32_stmxcsr();
csr |= CSR_FLUSH_TO_ZERO;
__builtin_ia32_ldmxcsr(csr);

In case it's not obvious from the names, __builtin_ia32_stmxcsr and __builtin_ia32_ldmxcsr are available only if you're targeting a x86 processor. ARM, Sparc, MIPS, etc. will each need separate platform-specific code with this approach.

Cataclysm answered 21/11, 2011 at 19:22 Comment(1)
Yes it should have the MXCSR register, which is 32bit too.Cataclysm
H
1

You apparently want some CPU instructions called FTZ (Flush To Zero) and DAZ (Denormals Are Zero).

I found the information on an audio web site but their link to the Intel documentation was missing. They are apparently SSE2 instructions so they should work on AMD CPUs that support that.

I don't know what you can do in GCC to force that on in a portable way. You can always write inline assembly code to use them though. You may have to force GCC to use only SSE2 for floating point math.

Homopolar answered 21/3, 2010 at 16:5 Comment(1)
Do all IEEE-754 FPUs have such instructions? Or is this specific to one or more instruction set architectures?Orangeade
J
0

Just as an addition to the other answers, if you actually have a problem with denormal floating point values you probably have a precision problem in addition to your performance issue.

It may be a good idea to check if you can restructure your computations to keep the numbers larger to avoid losing precision and performance.

Judicatory answered 21/3, 2010 at 16:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.