These days it shouldn't be a problem to use a C++11 compiler which includes a C99/C++11 math library. But then the question becomes: which rounding function do you pick?
C99/C++11 round()
is often not actually the rounding function you want. It uses a funky rounding mode that rounds away from 0 as a tie-break on half-way cases (+-xxx.5000
). If you do specifically want that rounding mode, or you're targeting a C++ implementation where round()
is faster than rint()
, then use it (or emulate its behaviour with one of the other answers on this question which took it at face value and carefully reproduced that specific rounding behaviour.)
round()
's rounding is different from the IEEE754 default round to nearest mode with even as a tie-break. Nearest-even avoids statistical bias in the average magnitude of numbers, but does bias towards even numbers.
There are two math library rounding functions that use the current default rounding mode: std::nearbyint()
and std::rint()
, both added in C99/C++11, so they're available any time std::round()
is. The only difference is that nearbyint
never raises FE_INEXACT.
Prefer rint()
for performance reasons: gcc and clang both inline it more easily, but gcc never inlines nearbyint()
(even with -ffast-math
)
gcc/clang for x86-64 and AArch64
I put some test functions on Matt Godbolt's Compiler Explorer, where you can see source + asm output (for multiple compilers). For more about reading compiler output, see this Q&A, and Matt's CppCon2017 talk: “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid”,
In FP code, it's usually a big win to inline small functions. Especially on non-Windows, where the standard calling convention has no call-preserved registers, so the compiler can't keep any FP values in XMM registers across a call
. So even if you don't really know asm, you can still easily see whether it's just a tail-call to the library function or whether it inlined to one or two math instructions. Anything that inlines to one or two instructions is better than a function call (for this particular task on x86 or ARM).
On x86, anything that inlines to SSE4.1 roundsd
can auto-vectorize with SSE4.1 roundpd
(or AVX vroundpd
). (FP->integer conversions are also available in packed SIMD form, except for FP->64-bit integer which requires AVX512.)
Rounding to int
/ long
/ long long
:
You have two options here: use lrint
(like rint
but returns long
, or long long
for llrint
), or use an FP->FP rounding function and then convert to an integer type the normal way (with truncation). Some compilers optimize one way better than the other.
long l = lrint(x);
int i = (int)rint(x);
Note that int i = lrint(x)
converts float
or double
-> long
first, and then truncates the integer to int
. This makes a difference for out-of-range integers: Undefined Behaviour in C++, but well-defined for the x86 FP -> int instructions (which the compiler will emit unless it sees the UB at compile time while doing constant propagation, then it's allowed to make code that breaks if it's ever executed).
On x86, an FP->integer conversion that overflows the integer produces INT_MIN
or LLONG_MIN
(a bit-pattern of 0x8000000
or the 64-bit equivalent, with just the sign-bit set). Intel calls this the "integer indefinite" value. (See the cvttsd2si
manual entry, the SSE2 instruction that converts (with truncation) scalar double to signed integer. It's available with 32-bit or 64-bit integer destination (in 64-bit mode only). There's also a cvtsd2si
(convert with current rounding mode), which is what we'd like the compiler to emit, but unfortunately gcc and clang won't do that without -ffast-math
.
Also beware that FP to/from unsigned
int / long is less efficient on x86 (without AVX512). Conversion to 32-bit unsigned on a 64-bit machine is pretty cheap; just convert to 64-bit signed and truncate. But otherwise it's significantly slower.
x86 clang with/without -ffast-math -msse4.1
: (int/long)rint
inlines to roundsd
/ cvttsd2si
. (missed optimization to cvtsd2si
). lrint
doesn't inline at all.
x86 gcc6.x and earlier without -ffast-math
: neither way inlines
- x86 gcc7 without
-ffast-math
: (int/long)rint
rounds and converts separately (with 2 total instructions of SSE4.1 is enabled, otherwise with a bunch of code inlined for rint
without roundsd
). lrint
doesn't inline.
x86 gcc with -ffast-math
: all ways inline to cvtsd2si
(optimal), no need for SSE4.1.
AArch64 gcc6.3 without -ffast-math
: (int/long)rint
inlines to 2 instructions. lrint
doesn't inline
- AArch64 gcc6.3 with
-ffast-math
: (int/long)rint
compiles to a call to lrint
. lrint
doesn't inline. This may be a missed optimization unless the two instructions we get without -ffast-math
are very slow.
std::cout << std::fixed << std::setprecision(0) << -0.9
, for example. – Rufforound
is available since C++11 in<cmath>
. Unfortunately if you are in Microsoft Visual Studio it is still missing: connect.microsoft.com/VisualStudio/feedback/details/775474/… – Racyround
has a lot of caveats. Before C++11, the standard relied on C90 which did not includeround
. C++11 relies on C99 which does haveround
but also as I noted includestrunc
which has different properties and may be more appropriate depending on the application. Most answers also seem to ignore that a user may wish to return an integral type which has even more issues. – Logicstatic_cast<int>(std::round(0.1))
, for more details there are the answers – Lancers