Why does compiler generate additional sqrts in the compiled assembly code

//----------------start of for loop---------------- call readTSC movq %rax, -32(%rbp) movl $0, -4(%rbp) jmp .L4 .L6: cvtsi2sd -4(%rbp), %xmm1 // 1. use sqrtsd instruction sqrtsd %xmm1, %xmm0 ucomisd %xmm0, %xmm0 jp .L8 je .L5 .L8: movapd %xmm1, %xmm0 // 2. use C funciton call call sqrt .L5: movsd -16(%rbp), %xmm1 addsd %xmm1, %xmm0 movsd %xmm0, -16(%rbp) addl $1, -4(%rbp) .L4: movl -4(%rbp), %eax cmpl -36(%rbp), %eax jl .L6 //----------------end of for loop---------------- call readTSC

It's using the library sqrt function for error handling. See glibc's documentation: 20.5.4 Error Reporting by Mathematical Functions: math functions set errno for compatibility with systems that don't have IEEE754 exception flags. Related: glibc's math_error(7) man page.

As an optimization, it first tries to perform the square root by the inlined sqrtsd instruction, then checks the result against itself using the ucomisd instruction which sets the flags as follows:

CASE (RESULT) OF
   UNORDERED:    ZF,PF,CF  111;
   GREATER_THAN: ZF,PF,CF  000;
   LESS_THAN:    ZF,PF,CF  001;
   EQUAL:        ZF,PF,CF  100;
ESAC;

In particular, comparing a QNaN to itself will return UNORDERED, which is what you will get if you try to take the square root of a negative number. This is covered by the jp branch. The je check is just paranoia, checking for exact equality.

Also note that gcc has a -fno-math-errno option which will sacrifice this error handling for speed. This option is part of -ffast-math, but can be used on its own without enabling any result-changing optimizations.

sqrtsd on its own correctly produces NaN for negative and NaN inputs, and sets the IEEE754 Invalid flag. The check and branch is only to preserve the errno-setting semantics which most code doesn't rely on.

-fno-math-errno is the default on Darwin (OS X), where the math library never sets errno, so functions can be inlined without this check.

Recommended topics

Hot tags