I'm reading a C++ function that handles timestamps, and there's an expression like this:
t % (60 * 60) % 60
I thought this expression was strictly equivalent to:
t % 60
However, when I entered this code into https://godbolt.org, I found that gcc did not think so.
Information:
- Source code and compilation results: https://godbolt.org/z/fnsxjanja
- gcc version: x86-64 gcc 13.2 (with
-O3
) - clang version: x86-64 clang 17.0.1 (with
-O3
) - msvc version: x64 msvc v19.38 (with
-O2
because there is no-O3
) - icx version: x86-64 icx 2024.0.0 (with
-O3
)
The following C++ code:
#include <cstdint>
uint64_t mod3600mod60(uint64_t n) { return n % 3600 % 60; }
uint64_t mod60(uint64_t n) { return n % 60; }
will be compiled into:
mod3600mod60(unsigned long):
movabs rax, 655884233731895169
mov rdx, rdi
shr rdx, 4
mul rdx
movabs rax, -8608480567731124087
shr rdx, 3
imul rdx, rdx, 3600
sub rdi, rdx
mul rdi
mov rax, rdx
shr rax, 5
mov rdx, rax
sal rdx, 4
sub rdx, rax
mov rax, rdi
sal rdx, 2
sub rax, rdx
ret
mod60(unsigned long):
movabs rax, -8608480567731124087
mul rdi
mov rax, rdx
shr rax, 5
mov rdx, rax
sal rdx, 4
sub rdx, rax
mov rax, rdi
sal rdx, 2
sub rax, rdx
ret
I know that for dividing by a constant or modulo a constant, the compiler will optimize, but for two consecutive modulo, the compiler does not seem to optimize to the same as modulo once.
First, I tried to compile the above two functions with gcc, clang, msvc and icx respectively, and found that they would not optimize these two functions to the same result.
Next, I tried changing both moduli to powers of 2 and found that for the following two functions:
#include <cstdint>
uint64_t mod2048mod64(uint64_t n) { return n % 2048 % 64; }
uint64_t mod64(uint64_t n) { return n % 64; }
All four compilers will optimize the two to be exactly the same. Here is an example of the compilation result of gcc:
mod2048mod64(unsigned long):
mov rax, rdi
and eax, 63
ret
mod64(unsigned long):
mov rax, rdi
and eax, 63
ret
Next, I tried changing the first modulus to a non-power of 2, for the following function:
#include <cstdint>
uint64_t mod3200mod64(uint64_t n) { return n % 3200 % 64; }
All four compilers will still optimize exactly the same as a single % 64
.
Next, I tried to change both modulus to a smaller non-power of 2 to confirm whether the reason for the lack of optimization was that the modulus was too large. For the following two functions:
#include <cstdint>
uint64_t mod6mod3(uint64_t n) { return n % 6 % 3; }
uint64_t mod3(uint64_t n) { return n % 3; }
As with the 3600 and 60, none of the four compilers can optimize these two functions to the same level. Here is the optimization result of gcc as an example:
mod6mod3(unsigned long):
movabs rcx, -6148914691236517205
mov rax, rdi
mul rcx
shr rdx, 2
lea rax, [rdx+rdx*2]
add rax, rax
sub rdi, rax
mov rax, rdi
mul rcx
mov rax, rdx
and rdx, -2
shr rax
add rdx, rax
mov rax, rdi
sub rax, rdx
ret
mod3(unsigned long):
movabs rax, -6148914691236517205
mul rdi
mov rax, rdx
and rdx, -2
shr rax
add rdx, rax
mov rax, rdi
sub rax, rdx
ret
Finally, I changed the parameters and return values of all the above functions to uint32_t
and uint16_t
respectively for compilation, and the results obtained were similar to uint64_t
.
Summarized as follows:
- For two operations on unsigned integers modulo a constant
n % A % B
, whereA
is a multiple ofB
:- For the case where
B
is a power of 2, gcc, clang, msvc and icx can all optimize it to be the same as a singlen % B
; - For the case where
B
is not a power of 2, gcc, clang, msvc and icx cannot optimize it to be the same as a singlen % B
.
- For the case where
Shouldn't n % A % B
and n % B
be strictly mathematically equivalent if A
is a multiple of B
?
-O0
? I think an obviously established formula can be used to optimize. – Ragsdale