If you want to convert uint64_t
to a uint8_t[8]
(little endian). On a little endian architecture you can just do an ugly reinterpret_cast<>
or memcpy()
, e.g:
void from_memcpy(const std::uint64_t &x, uint8_t* bytes) {
std::memcpy(bytes, &x, sizeof(x));
}
This generates efficient assembly:
mov rax, qword ptr [rdi]
mov qword ptr [rsi], rax
ret
However it is not portable. It will have different behaviour on a little endian machine.
For converting uint8_t[8]
to uint64_t
there is a great solution - just do this:
void to(const std::uint8_t* bytes, std::uint64_t &x) {
x = (std::uint64_t(bytes[0]) << 8*0) |
(std::uint64_t(bytes[1]) << 8*1) |
(std::uint64_t(bytes[2]) << 8*2) |
(std::uint64_t(bytes[3]) << 8*3) |
(std::uint64_t(bytes[4]) << 8*4) |
(std::uint64_t(bytes[5]) << 8*5) |
(std::uint64_t(bytes[6]) << 8*6) |
(std::uint64_t(bytes[7]) << 8*7);
}
This looks inefficient but actually with Clang -O2
it generates exactly the same assembly as before, and if you compile on a big endian machine it will be smart enough to use a native byte swap instruction. E.g. this code:
void to(const std::uint8_t* bytes, std::uint64_t &x) {
x = (std::uint64_t(bytes[7]) << 8*0) |
(std::uint64_t(bytes[6]) << 8*1) |
(std::uint64_t(bytes[5]) << 8*2) |
(std::uint64_t(bytes[4]) << 8*3) |
(std::uint64_t(bytes[3]) << 8*4) |
(std::uint64_t(bytes[2]) << 8*5) |
(std::uint64_t(bytes[1]) << 8*6) |
(std::uint64_t(bytes[0]) << 8*7);
}
Compiles to:
mov rax, qword ptr [rdi]
bswap rax
mov qword ptr [rsi], rax
ret
My question is: is there an equivalent reliably-optimised construct for converting in the opposite direction? I've tried this, but it gets compiled naively:
void from(const std::uint64_t &x, uint8_t* bytes) {
bytes[0] = x >> 8*0;
bytes[1] = x >> 8*1;
bytes[2] = x >> 8*2;
bytes[3] = x >> 8*3;
bytes[4] = x >> 8*4;
bytes[5] = x >> 8*5;
bytes[6] = x >> 8*6;
bytes[7] = x >> 8*7;
}
Edit: After some experimentation, this code does get compiled optimally with GCC 8.1 and later as long as you use uint8_t* __restrict__ bytes
. However I still haven't managed to find a form that Clang will optimise.
memcpy()
or convert in the opposite direction. – Davidadaviddebytes[0]
tobytes[7]
, etc. and it seems to have worked on Godbolt. It seems the pointer assignments before and after the shifts that are the trick for the optimization. – Parasiticide