How should the [u]int_fastN_t types be defined for x86_64, with or without the x32 ABI?

Asked 30/4, 2016 at 22:51 Answered 1/5, 2016 at 0:6

The x32 ABI specifies, among other things, 32-bit pointers for code generated for the x86_64 architecture. It combines the advantages of the x86_64 architecture (including 64-bit CPU registers) with the reduced overhead of 32-bit pointers.

The <stdint.h> header defines typedefs int_fast8_t, int_fast16_t, int_fast32_t, and int_fast64_t (and corresponding unsigned types uint_fast8_t et al), each of which is:

an integer type that is usually fastest to operate with among all integer types that have at least the specified width

with a footnote:

The designated type is not guaranteed to be fastest for all purposes; if the implementation has no clear grounds for choosing one type over another, it will simply pick some integer type satisfying the signedness and width requirements.

(Quoted from the N1570 C11 draft.)

The question is, how should [u]int_fast16_t and [u]int_fast32_t types be defined for the x86_64 architecture, with or without the x32 ABI? Is there an x32 document that specifies these types? Should they be compatible with the 32-bit x86 definitions (both 32 bits) or, since x32 has access to 64-bit CPU registers, should they be the same size with or without the x32 ABI? (Note that the x86_64 has 64-bit registers regardless of whether the x32 ABI is in use or not.)

Here's a test program (which depends on the gcc-specific __x86_64__ macro):

#include <stdio.h>
#include <stdint.h>
#include <limits.h>

int main(void) {
#if defined __x86_64__ && SIZE_MAX == 0xFFFFFFFF
    puts("This is x86_64 with the x32 ABI");
#elif defined __x86_64__ && SIZE_MAX > 0xFFFFFFFF
    puts("This is x86_64 without the x32 ABI");
#else
    puts("This is not x86_64");
#endif
    printf("uint_fast8_t  is %2zu bits\n", CHAR_BIT * sizeof (uint_fast8_t));
    printf("uint_fast16_t is %2zu bits\n", CHAR_BIT * sizeof (uint_fast16_t));
    printf("uint_fast32_t is %2zu bits\n", CHAR_BIT * sizeof (uint_fast32_t));
    printf("uint_fast64_t is %2zu bits\n", CHAR_BIT * sizeof (uint_fast64_t));
}

When I compile it with gcc -m64, the output is:

This is x86_64 without the x32 ABI
uint_fast8_t  is  8 bits
uint_fast16_t is 64 bits
uint_fast32_t is 64 bits
uint_fast64_t is 64 bits

When I compile it with gcc -mx32, the output is:

This is x86_64 with the x32 ABI
uint_fast8_t  is  8 bits
uint_fast16_t is 32 bits
uint_fast32_t is 32 bits
uint_fast64_t is 64 bits

(which, apart from the first line, matches the output with gcc -m32, which generates 32-bit x86 code).

Is this a bug in glibc (which defines the <stdint.h> header), or is it following some x32 ABI requirement? There are no references to the [u]int_fastN_t types in either the x32 ABI document or the x86_64 ABI document, but there could be something else that specifies it.

One could argue that the fast16 and fast32 types should be 64 bits with or with x32, since 64-bit registers are available; would that makes more sense that the current behavior?

(I've substantially edited the original question, which asked only about the x32 ABI. The question now asks about x86_64 with or without x32.)

Kehr answered 30/4, 2016 at 22:51 Comment(9)

Why would it be a bug in glibc? – Calico 30/4, 2016 at 23:12

@RossRidge: If your point is that <stdint.h> is provided by glibc, not by gcc, you're right; I've updated the question. If you're saying it's not a bug, I'd be interested in your rationale. Since the system has 64-bit registers, int64_t should be faster than int32_t, so int_fast32_t should be 64 bits, just as it is in x86_64. – Kehr 30/4, 2016 at 23:16

I'm interested in hearing your rational first. Why would having 64-bit registers make int64_t faster than int32_t when working with values that only need 32 bits? – Calico 30/4, 2016 at 23:18

@RossRidge: (1) For example, copying a 64-bit register to or from a 64-bit memory object might be faster than copying a 64-bit register to or from a 32-bit memory object (no need for masking or sign extension). (2) For non-x32 x86_64, gcc already makes [u]int_fast16_t and [u]int_fast32_t 64 bits. Whatever rationale less to that decision should also apply to x32, unless I'm missing something. – Kehr 30/4, 2016 at 23:24

Copying 64-bit values needs a REX prefix which is longer. Copying 32-bit values always zero out the top 32 bits so it's equally fast or faster due to less space in I-cache – Indican 30/4, 2016 at 23:30

@LưuVĩnhPhúc: Ok, so should gcc make [u]int32_t 32 bits on x86_64? They currently have different sizes on x32 vs. x86_64; is there any good reason form them to differ? – Kehr 30/4, 2016 at 23:31

@KeithThompson: I can only think that the reason x86_64 use 64 bits for everything is alignment: having all types of 64-bits makes for a perfect all-8-byte-aligned world. In x32, since pointers are 32-bits the perfect alignment is already broken so having int_fast16_t with 64-bits is not so attractive any more. – Clouse 1/5, 2016 at 1:45

@rodrigo: But 4-byte integers (including int in most implementations) are 4-byte aligned, not 8-byte aligned. – Kehr 1/5, 2016 at 1:57

I think in long mode (so both amd64 and x32 but not i386), they should all be 32 bits wide (no prefix needed, upper half automatically cleared, no register dependency issue when using partial registers, and unlike nōn-long mode, 16-bit registers are utterly more expensive), so I’d say 8/16/32/64 for i386, 32/32/32/64 for both amd64 and x32. – Euratom 28/1, 2022 at 2:24

Generally speaking you would expect 32-bit integer types to be marginally faster than 64-bit integer types on x86-64 CPUs. Partly because they use less memory, but also because 64-bit instructions require an extra prefix byte over their 32-bit counterparts. The 32-bit division instruction is significantly faster than 64-bit one, but otherwise instruction execution latencies are the same.

It isn't normally necessary to extend 32-bit when loading them into 64-bit registers. While the CPU automatically zero-extends the values in this case, this is usually only a benefit because it avoids partial register stalls. What gets loaded into upper part of the register is less important than the fact that the entire register is modified. The contents of the upper part of the register don't matter because when they're used to hold 32-bit types they're normally only used with 32-bit instructions that only work with the lower 32-bit part of the register.

The inconsistency between between the sizes of int_fast32_t types when using the x32 and x86-64 ABIs is probably best justified by the fact that pointers are 64 bits wide. Whenever a 32-bit integer is added to a pointer it would need to be extended, making this a much more likely occurrence when using the x86-64 ABI.

Another factor to consider is that whole point of the x32 ABI is to get better performance by using smaller types. Any application that benefits from pointers and related types being smaller should also benefit from int_fast32_t being smaller as well.

Calico answered 1/5, 2016 at 0:6 Comment(3)

The question is specific to the x32 ABI, but I'm now thinking that asking about the proper sizes of the [u]int_fastN_t types for x86_64 with and without x32 would make for a better question. Since you've posted an answer to the question as it stands now, it might be unfair to change it. Would you object if I made such a change? – Kehr 1/5, 2016 at 1:17

@KeithThompson Sure, it doesn't sound like it would change my answer. – Calico 1/5, 2016 at 1:20

@KeithThompson: Related: int_fast8_t size vs int_fast16_t size on x86-64 platform takes a look at the badness of glibc's design decision, and the fact that it basically poisoned the well for anyone to ever use int_fastN_t types for most use-cases in portable code, if they actually care about optimal performance. – Winchester 15/11, 2023 at 21:20

I have compiled the following sample code to check the generated code for a simple sum with different integer types:

#include <stdint.h>

typedef int16_t INT;
//typedef int32_t INT;
//typedef int64_t INT;

INT foo()
{
    volatile INT a = 1, b = 2;
    return a + b;
}

And then I disassembled the code generated with each of the integer types. The compilation command is gcc -Ofast -mx32 -c test.c. Note that in full 64-bit mode the generated code will be almost the same because there are no pointers in my code (only %rsp instead of %esp).

With int16_t it emits:

00000000 <foo>:
   0:   b8 01 00 00 00          mov    $0x1,%eax
   5:   ba 02 00 00 00          mov    $0x2,%edx
   a:   67 66 89 44 24 fc       mov    %ax,-0x4(%esp)
  10:   67 66 89 54 24 fe       mov    %dx,-0x2(%esp)
  16:   67 0f b7 54 24 fc       movzwl -0x4(%esp),%edx
  1c:   67 0f b7 44 24 fe       movzwl -0x2(%esp),%eax
  22:   01 d0                   add    %edx,%eax
  24:   c3                      retq

With int32_t:

00000000 <foo>:
   0:   67 c7 44 24 f8 01 00 00 00  movl   $0x1,-0x8(%esp)
   9:   67 c7 44 24 fc 02 00 00 00  movl   $0x2,-0x4(%esp)
  12:   67 8b 54 24 f8              mov    -0x8(%esp),%edx
  17:   67 8b 44 24 fc              mov    -0x4(%esp),%eax
  1c:   01 d0                       add    %edx,%eax
  1e:   c3                          retq

And with int64_t:

00000000 <foo>:
   0:   67 48 c7 44 24 f0 01 00 00 00   movq   $0x1,-0x10(%esp)
   a:   67 48 c7 44 24 f8 02 00 00 00   movq   $0x2,-0x8(%esp)
  14:   67 48 8b 54 24 f0               mov    -0x10(%esp),%rdx
  1a:   67 48 8b 44 24 f8               mov    -0x8(%esp),%rax
  20:   48 01 d0                        add    %rdx,%rax
  23:   c3                              retq

Now, I don't claim to know exactly why the compiler generated exactly this code (maybe the volatile keyword combined with a non-register-size integer type is not the best choice?). But from that generated code we can draw the following conclusions:

The slowest type is int16_t. It needs additional instructions to move the values around.
The fastest type is int32_t. Although the 32-bit and the 64-bit versions have the same number of instructions, the 32-bit code is shorter in bytes, so it will be more cache friendly, so faster.

So the natural choices for the fast types would be:

For int_fast16_t, choose int32_t.
For int_fast32_t, choose int32_t.
For int_fast64_t, choose int64_t (what else).

Clouse answered 30/4, 2016 at 23:37 Comment(8)

@KeithThompson: I don't understand what you mean with x86_64 with and without x32. x86_64 and x32 are different architectures, although both run in 64-bit processors. But anyway, I'd expect the integer types to be the same, as only the pointer size is different. – Clouse 1/5, 2016 at 1:22

@KeithThompson: Well, I've just checked, and I was wrong. These three fast integers are all 64-bit long in x86_64, that is different from x32. Maybe the difference is related to default memory alignment? But I'm afraid that my answer, as it is, does not apply to the x86_64 question. – Clouse 1/5, 2016 at 1:30

So, do you mind if I update the question? I hate it when other questioners substantially change their questions after answers have been posted, and I don't want to do it without your approval, but I think that covering x86_64 with and without x32 would improve the question. – Kehr 1/5, 2016 at 1:34

@KeithThompson: Oh, I don't mind at all! I'm actually thinking that I may be totally wrong about this answer anyway. What I'm thinking know is that few people care about the *int_fast* types, and that few people care about x32 architecture, so nobody cared about both things at the same time. Maybe about your improved question, there will be a real answer. – Clouse 1/5, 2016 at 1:38

x32 is an ILP32 ABI for x86-64. It is the same architecture as the normal -m64 LP64 x86-64 System V ABI for x86-64. x32 is even defined in a chapter of the same x86-64 System V ABI document. Your answer shows how int_fastN_t currently are defined on -mx32, but doesn't say anything about whether that's a good choice or not. (Or the fact that int_fast32_t on -m64 is 64-bit, which actually makes division on it slower, and even multiplication on some CPUs.) – Winchester 6/1, 2020 at 12:18

@PeterCordes even int_fast16_t is 64-bit on -m64, which surprised me the most – Indican 7/1, 2020 at 13:37

@phuclv: if you're going to make int_fast32_t a 64-bit type, it would be inconsistent to make int_fast16_t narrower. You could argue it's less likely to get used as an array index, removing that justification for widening to 64-bit. Clearly don't use them for large arrays (or in structs where they might need extra padding) because of implementations like that. – Winchester 7/1, 2020 at 13:45

-3

Tough. Let's just take int_fast8_t. If a developer uses a large array to store lots of 8 bit signed integers, then int8_t will be fastest because of caching. I'd declare that using large arrays of int_fast8_t is likely a bad idea.

You'd need to take a large codebase, and systematically replace int8_t and signed chars and plain char if it is signed with int_fast8_t. Then benchmark the code using different typedefs for int_fast8_t, and measure what's fastest.

Note that undefined behaviour is going to change. For example assigning 255 will give a result of -1 if the type is int8_t and 255 otherwise.

Karyolysis answered 30/4, 2016 at 23:18 Comment(3)

I don't see how this answers the question. – Kehr 30/4, 2016 at 23:20

int_fast8_t and uint_fast8_t are 8 bits under gcc; I was asking about [u]int16_t and [u]int32_t. – Kehr 30/4, 2016 at 23:27

I'm also curious what you meant by "Tough". – Kehr 30/4, 2016 at 23:28

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags