Unsigned int to unsigned long long well defined?
Asked Answered
R

2

5

I wanted to see what was happening behind the scenes when an unsigned long long was assigned the value of an unsigned int. I made a simple C++ program to try it out and moved all the io out of main():

#include <iostream>
#include <stdlib.h>

void usage() {
        std::cout << "Usage: ./u_to_ull <unsigned int>\n";
        exit(0);
}

void atoiWarning(int foo) {
        std::cout << "WARNING: atoi() returned " << foo << " and (unsigned int)foo is " <<
 ((unsigned int)foo) << "\n";
}

void result(unsigned long long baz) {
        std::cout << "Result as unsigned long long is " << baz << "\n";
}

int main(int argc, char** argv) {
        if (argc != 2) usage();

        int foo = atoi(argv[1]);
        if (foo < 0) atoiWarning(foo);

        // Signed to unsigned
        unsigned int bar = foo;

        // Conversion
        unsigned long long baz = -1;
        baz = bar;

        result(baz);

        return 0;
}

The resulting assembly produced this for main:

0000000000400950 <main>:
  400950:       55                      push   %rbp
  400951:       48 89 e5                mov    %rsp,%rbp
  400954:       48 83 ec 20             sub    $0x20,%rsp
  400958:       89 7d ec                mov    %edi,-0x14(%rbp)
  40095b:       48 89 75 e0             mov    %rsi,-0x20(%rbp)
  40095f:       83 7d ec 02             cmpl   $0x2,-0x14(%rbp)
  400963:       74 05                   je     40096a <main+0x1a>
  400965:       e8 3a ff ff ff          callq  4008a4 <_Z5usagev>
  40096a:       48 8b 45 e0             mov    -0x20(%rbp),%rax
  40096e:       48 83 c0 08             add    $0x8,%rax
  400972:       48 8b 00                mov    (%rax),%rax
  400975:       48 89 c7                mov    %rax,%rdi
  400978:       e8 0b fe ff ff          callq  400788 <atoi@plt>
  40097d:       89 45 f0                mov    %eax,-0x10(%rbp)
  400980:       83 7d f0 00             cmpl   $0x0,-0x10(%rbp)
  400984:       79 0a                   jns    400990 <main+0x40>
  400986:       8b 45 f0                mov    -0x10(%rbp),%eax
  400989:       89 c7                   mov    %eax,%edi
  40098b:       e8 31 ff ff ff          callq  4008c1 <_Z11atoiWarningi>
  400990:       8b 45 f0                mov    -0x10(%rbp),%eax
  400993:       89 45 f4                mov    %eax,-0xc(%rbp)
  400996:       48 c7 45 f8 ff ff ff    movq   $0xffffffffffffffff,-0x8(%rbp)
  40099d:       ff
  40099e:       8b 45 f4                mov    -0xc(%rbp),%eax
  4009a1:       48 89 45 f8             mov    %rax,-0x8(%rbp)
  4009a5:       48 8b 45 f8             mov    -0x8(%rbp),%rax
  4009a9:       48 89 c7                mov    %rax,%rdi
  4009ac:       e8 66 ff ff ff          callq  400917 <_Z6resulty>
  4009b1:       b8 00 00 00 00          mov    $0x0,%eax
  4009b6:       c9                      leaveq
  4009b7:       c3                      retq

The -1 from the C++ makes it clear that -0x8(%rbp) corresponds to baz (due to $0xffffffffffffffff). -0x8(%rbp) is written to by %rax, but the top four bytes of %rax appear to not have been assigned, %eaxwas assigned

Does this suggest that the top 4 bytes of -0x8(%rbp) are undefined?

Redman answered 28/1, 2015 at 15:48 Comment(8)
The top half of a 64bit register is zeroed out when you write to only the lower half.Foreland
@harold, I thought an explicit cltq was needed?Redman
What was the output of the result function ?Compressibility
cltq is for sign extending. A plain write (including any 32-bit modification, for example also or eax, 0 and even mov eax, eax) uses zero extension.Foreland
@abcthomas, Two samples, although really what matters is the top half of %rax: ./u_to_ull 10, Result as unsigned long long is 10; ./u_to_ull -10, WARNING: atoi() returned -10 and (unsigned int)foo is 4294967286, Result as unsigned long long is 4294967286Redman
I can't quite understand your question. Are you asking what the C++ language says about assigning a smaller size unsigned to a larger one? Or are you asking about this specific compiler's generated assembly code? In that case, what compiler, what options did you use, etc?Mixon
@MarkB, I think harold's comment would answer this question well, although I would like to see that documented somewhere. I guess a C++ guarantee would be good as well. Compiled as g++ u_to_ull.c -o u_to_ullRedman
Why do most x64 instructions zero the upper part of a 32 bit registerMarigolde
F
5

In the Intel® 64 and IA-32 Architectures Software Developer Manuals, volume 1, chapter 3.4.1.1 (General-Purpose Registers in 64-Bit Mode), it says

32-bit operands generate a 32-bit result, zero-extended to a 64-bit result in the destination general-purpose register.

So after mov -0xc(%rbp),%eax, the upper half of rax is defined, and it's zero.

This also applies to the 87 C0 encoding of xchg eax, eax, but not to its 90 encoding (which is defined as nop, overruling the rule quoted above).

Foreland answered 28/1, 2015 at 16:16 Comment(1)
Great, thank you. I figured something must be going on that I had missed or crazy values would be bound to show up eventuallyRedman
M
3

From C++98 (and C++11 seems to be unchanged) 4.7/2 (integral conversions - no promotions are relevant) we learn:

If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the number of bits used to represent the unsigned type).

This clearly shows that as long as the source and destination are unsigned and the destination is at least as large as the source, the value will be unchanged. If the compiler generated code that failed to make the larger value equal, the compiler is buggy.

Mixon answered 28/1, 2015 at 16:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.