32-bit pointers with the x86-64 ISA: why not?

Asked 10/2, 2012 at 19:4 Answered 23/10, 2020 at 16:38

Solved performance pointers x86-64 32bit-64bit abi

The x86-64 instruction set adds more registers and other improvements to help streamline executable code. However, in many applications the increased pointer size is a burden. The extra, unused bytes in every pointer clog up the cache and might even overflow RAM. GCC, for example, builds with the -m32 flag, and I assume this is the reason.

It's possible to load a 32-bit value and treat it as a pointer. This doesn't necessitate extra instructions, just load/compute the 32 bits and load from the resulting address. The trick won't be portable, though, as platforms have different memory maps. On Mac OS X, the entire low 4 GiB of address space is reserved. Still, for one program I wrote, hackishly adding 0x100000000L to 32-bit "addresses" before use improved performance greatly over true 64-bit addresses, or compiling with -m32.

Is there any fundamental impediment to having a 32-bit, x86-64 platform? I suppose that supporting such a chimera would add complexity to any operating system, and anyone wanting that last 20% should just Make it Work™, but it still seems that this would be the best fit for a variety of computationally intensive programs.

Borroff answered 10/2, 2012 at 19:4 Comment(13)

Profiling data in the majority of applications suggests no significant loss due to increased pointer size. – Uncircumcised 10/2, 2012 at 19:9

The Intel Compiler has the option Qauto-ilp32 that "tries" to use 32-bits for pointers - even in x64 mode. – Dowdell 10/2, 2012 at 19:10

@Dowdell but that's more like the old-fashioned near and far pointers, right? That solution is OK, I suppose, but it's not quite as clean as the one I'm referring to. – Borroff 10/2, 2012 at 19:12

@Borroff No it isn't. If I'm reading it correctly, it actually tries to compress all pointers down to 32-bits. – Dowdell 10/2, 2012 at 19:18

@Mysticial: "… determine if there are 64-bit pointers which can be safely shrunk into 32-bit pointers." The last paragraph states that you are required not to use more than 32 bits, but all the preceding text refers to detection of overflow and selective optimization. It seems contradictory. Anyway, pointers passed to external library functions still need to be "far"… so the compiler needs some intelligence there. – Borroff 10/2, 2012 at 19:26

Oh, I thought you were referring to the near and far pointers from the old 16-bit days. – Dowdell 10/2, 2012 at 19:28

I think that when you pass a pointer into an external library, it will be padded with zeros back to 64-bits. So it's not a problem here. Returning a pointer from an external library can't be provably < 32-bits. So any pointer derived from those cannot be compressed. Basically, all pointers you generate yourself can be guaranteed < 32-bits using page-mapping tricks via the compiler's memory allocator. – Dowdell 10/2, 2012 at 19:37

@Dowdell The trickiness is in struct definitions, where external libraries expect 64 bits, but internal interfaces still need to be optimized to 32 bits or you aren't saving any memory at all. So the compiler has to analyze and mark each declaration as pseudo-near or pseudo-far. Ideally it works transparently, which would be nicer than 16-bit style, but there seems to be black magic afoot. – Borroff 10/2, 2012 at 19:53

It seems like normal 32-bit code could also be used in such a process, right? – Mandymandych 11/2, 2012 at 2:16

You miss the point. Every x86_64 platform out there has a 32 bit subsystem for executing legacy 32 bit code. But using the 64 bit instruction set, which doubles the numbers of available registers from 8 to 16, can have significant performance improvements, and in some rare cases (e. g. crafty in the link I gave) the doubled integer width gives an additional performance boost. – Giavani 11/2, 2012 at 9:22

It's worth noting that some Java virtual machines take a somewhat different tack; object references are 32 bits, but rather than representing byte addresses within a 4GB space, they represent scaled addresses in a larger space (I think 8x scaling is typical, allowing 32GB of address space rather than 4GB). – Rotifer 9/3, 2014 at 19:14

Does this answer your question? How to use 32-bit pointers in 64-bit application? – Presumption 15/10, 2021 at 6:19

As it happens, that’s what I ended up doing. Quite a bit of fancy footwork but it doubled memory efficiency and cache locality. – Borroff 16/10, 2021 at 19:59

There is an ABI called "x32" for linux in development. It's a mix between x86_64 and ia32 similar to what you describe - 32 bit address space while using the full 64 bit register set. It needs a custom kernel, binutils and gcc.

Some SPEC runs indicate a performace improvement of about 30% in some benchmarks. See further information at https://sites.google.com/site/x32abi/

Giavani answered 11/2, 2012 at 0:19 Comment(2)

This is really good info. I felt a little unsure about asking the question, because its phrasing possibly precluded any good answer. But having even just one platform to support the chimera, just in case the difference is significant enough, does change the playing field. – Borroff 11/2, 2012 at 9:59

unfortunately Linux Kernel Developers Discuss Dropping x32 Support – Statistician 24/10, 2020 at 0:45

As Mysticial commented above, ICC has the -auto-ilp32 / /Qauto-ilp32 option to use 32-bit pointers in 64-bit mode:

Instructs the compiler to analyze the program to determine if there are 64-bit pointers that can be safely shrunk into 32-bit pointers and if there are 64-bit longs (on Linux* systems) that can be safely shrunk into 32-bit longs.

On Windows there's no x32abi like on Linux, but you can still use 32-bit pointers by disabling the /LARGEADDRESSAWARE flag which is enabled for 64-bit binaries by default

By default, 64-bit Microsoft Windows-based applications have a user-mode address space of several terabytes. For precise values, see Memory Limits for Windows and Windows Server Releases. However, applications can specify that the system should allocate all memory for the application below 2 gigabytes. This feature is beneficial for 64-bit applications if the following conditions are true:

A 2 GB address space is sufficient.

The code has many pointer truncation warnings.

Pointers and integers are freely mixed.

The code has polymorphism using 32-bit data types.

All pointers are still 64-bit pointers, but the system ensures that every memory allocation occurs below the 2 GB limit, so that if the application truncates a pointer, no significant data is lost. Pointers can be truncated to 32-bit values, then extended to 64-bit values by either sign extension or zero extension.

Virtual Address Space

Of course there's no direct compiler support like the -mx32 option in GCC, therefore you may need to deal with pointers manually every time you store a pointer to memory or dereference it. The simplest solution is to write a class wrapping a 32-bit pointer to handle that. Luckily MS also had experience on mixed 32 and 64-bit pointers in the same architecture so they have lots of supporting keywords/macros:

POINTER_32/__ptr32
POINTER_64/__ptr64
POINTER_SIGNED/__sptr
POINTER_UNSIGNED/__uptr

Google's V8 engine uses a different way by compressing pointers to 32 bits to save memory as well as improve performance. See the comparison in memory and performance improvement here

Statistician answered 23/10, 2020 at 16:38 Comment(0)

I do not expect it very hard to support such a model in the OS. About the only thing that needs to change for processes in this model is page management, pages must be allocated below the 4 GB point. The kernel too should allocate its buffers from the first 4 GBs of the virtual address space if it passes them to the application. The same applies to the loader that loads and starts applications. Other than that a 64-bit kernel should be able handle such apps w/o major modifications.

Compiler support shouldn't be a big issue either. It's mostly a matter of generating code that can use the extra CPU registers and their full 64 bits and adding proper REX prefixes whenever needed.

Did answered 11/2, 2012 at 2:9 Comment(0)

-5

It's called "x86-32 emulation", or WOW64 on Windows (presumably something else on other OSes) and it's a hardware flag in the processor. No need for any user-mode tricks here.

Uncircumcised answered 10/2, 2012 at 19:10 Comment(5)

Is that a user-accessible flag? So the OS is required to save/restore and support it? – Borroff 10/2, 2012 at 19:11

Ah, looked it up — en.wikipedia.org/wiki/WOW64 . No, that's just running standard x86 code, the old ISA with just 8 registers, on a 64-bit OS. Just the same as -m32. I don't think other OSes bother giving that "feature" a name. – Borroff 10/2, 2012 at 19:15

@Potatoswatter: And that's meaningfully different to what you describe how, exactly? Instead of doing user-mode hacks to achieve the objective, the processor achieves it in hardware. That's always gonna be faster. There's no compiler flag which achieves this, it's a hardware processor feature. – Uncircumcised 10/2, 2012 at 19:17

The OP ask for a real 64-bit program with 32-bit pointer, not 32-bit program on 64-bit windows – Statistician 15/8, 2013 at 2:11

There is a compiler flag; on GNU/Linux it's gcc -mx32 (use with the usual -O3 -march=native etc.) en.wikipedia.org/wiki/X32_ABI - an ILP32 ABI that can use 32-bit pointers in 64-bit mode, so you still have 16 registers instead of 8, and can efficiently do 64-bit integer operations. And a non-terrible calling convention that passes args in registers. – Desai 24/10, 2020 at 1:21

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags