C Avoiding Alignment Issues
Asked Answered
T

2

2

Could some please explain, what is really wrong with the following example, especially the part with "which might result in the 32-bit unsigned long being loaded from an address that is not a multiple of four":

"The compiler generally prevents alignment issues by naturally aligning all data types. In fact, alignment issues are normally not major concerns of the kernel developersthe gcc folks have to worry about them. Issues arise, however, when the programmer plays too closely with pointers and accesses data outside the environment anticipated by the compiler.

Accessing an aligned address with a recast pointer of a larger-aligned address causes an alignment issue (whatever that might mean for a particular architecture). That is, this is bad news:

char dog[10];
char *p = &dog[1];
unsigned long l = *(unsigned long *)p;

This example treats the pointer to a char as a pointer to an unsigned long, which might result in the 32-bit unsigned long being loaded from an address that is not a multiple of four.

If you are thinking, "When in the world would I do this?" you are probably right. Nevertheless, it has come up, and it will again, so be careful. The real-world examples might not be so obvious."

Though I don't really understand the problem, can it be solved by using the following code and if so, why?

char * dog = (char *)malloc(10 * sizeof(char));
char *p = dog +1;
unsigned long l = *(unsigned long*)p;
Trichroism answered 17/6, 2018 at 18:3 Comment(5)
Your proposed solution is guaranteed to be broken (instead of merely potentially broken like the original code).Osteoblast
Some CPUs require that 2-byte quantities are aligned on a 2-byte (even) boundary, and 4-byte quantities are aligned on a 4-byte boundary, and 8-byte quantities are aligned on an 8-byte boundary. When the address is misaligned (and odd address where an even address is required), that can cause all sorts of trouble. At best, the system detects the misaligned access with a trap to the o/s which redoes the access using two aligned reads and piecing together the original value from those. At worst, your program crashes. On second thoughts, the 'worst' and 'best' tags should be reversed.Oysterman
The meaning is that actually almost all processors are optimized for memory access on predefined, natural, alignment. On some processors the access with wrong alignment will cause exception. The snippet you show force memory access for a multibyte object, an unsigned integer, on wrong alignment.Pivoting
Please take a look at this question about the strict aliasing rule.Squabble
Starting from x86 Sandy Bridge, alignment is moot. In fact, tightly packed, unaligned data often leads to better performance due to better data locality. But yes, on other platforms it may be vital to align data properly.Furbish
L
3

Your proposed solution is pretty much the same as the quoted one, so it suffers from the same problem.

Misalignment problem

When you reserve memory, the compiler reserves it with the required alignment, either with the usage of automatic variables (char dog[10]), either with malloced variables.

When you fool the compiler by doing pointer arithmetic tricks, like the one you are doing, then it cannot guarantee that access alignment will be correct.

Why is this problematic?

Because, depending on the hardware architecture you are using, the compiler may emit instructions that require 2 or 4 byte alignment. For instance, ARM has several instructions that require data to be 2 byte aligned (this is, its address has to be even). Thus, your code built for an ARM processor would likely to emit an access violation.

How would you solve your problem then?

Usually, with a memcpy:

char *dog = malloc(10 * sizeof(char));
char *p = dog;
unsigned long l;

memcpy(&l, p+1, sizeof(l));
//You can use l safely now.

//Copy back l to the array:
memcpy(p+1, &l, sizeof(l));
Leaseholder answered 17/6, 2018 at 18:25 Comment(0)
D
4

The passage you quoted is exactly right.

Most of the time, you don't have to worry about alignment, because the compiler takes care of it for you, and that works well unless you're doing something so squirrelly that you succeed in foiling the compiler's attempts to protect you.

When you call malloc, there's no problem, because malloc is special (in several ways). Among other things, it's "guaranteed to return a pointer to storage suitably aligned for any type of object."

But yes, if you work at it, you can get yourself into trouble. Going back to something like the original example, suppose we had

char dog[] = "My dog Spot";
char *p = &dog[0];
unsigned long l = *(unsigned long *)p;

And suppose the array happened to be laid out in memory like this:

      +---+---+---+---+
100:  |   |   | M | y |
      +---+---+---+---+
104:  |   | d | o | g |
      +---+---+---+---+
108:  |   | S | p | o |
      +---+---+---+---+
112:  | t |\0 |   |   |
      +---+---+---+---+

That is, suppose array dog ends up at memory address 102, which is not a multiple of 4. So pointer p also points at address 102, and we try to access a long int at address 102. (You'll notice I have changed it to &dog[0], as opposed to &dog[1] in the original example, in an attempt to make things a little clearer.)

So we might expect variable l to end up containing either 1299783780 or 1679849805 (that is, either 0x4d792064 or 0x6420794d), since those are the representations of the first four bytes "My d" interpreted in either big-endian or little-endian representation.

But since it's an unaligned access, we might get neither number; the program might crash with something like a "bus error" instead.

If we were bound and determined to do this sort of thing, we could contrive to do the alignment ourselves, with something like this:

char dog[] = "My dog Spot";
char *p = dog;
int al = (intptr_t)p % sizeof(unsigned long);
al = sizeof(unsigned long) - al;
if(al == sizeof(unsigned long)) al = 0;
p += al;
unsigned long l = *(unsigned long *)p;

Of course, having shifted the pointer p until it points at a proper multiple of 4, it doesn't point at "My d" any more; now it points at " dog".

I've done this sort of thing once or twice, but I can't really recommend it.

Duer answered 17/6, 2018 at 19:37 Comment(1)
Your explanation was very helpful and exactly what I needed to fully understand the problem. Thank you very much!Trichroism
L
3

Your proposed solution is pretty much the same as the quoted one, so it suffers from the same problem.

Misalignment problem

When you reserve memory, the compiler reserves it with the required alignment, either with the usage of automatic variables (char dog[10]), either with malloced variables.

When you fool the compiler by doing pointer arithmetic tricks, like the one you are doing, then it cannot guarantee that access alignment will be correct.

Why is this problematic?

Because, depending on the hardware architecture you are using, the compiler may emit instructions that require 2 or 4 byte alignment. For instance, ARM has several instructions that require data to be 2 byte aligned (this is, its address has to be even). Thus, your code built for an ARM processor would likely to emit an access violation.

How would you solve your problem then?

Usually, with a memcpy:

char *dog = malloc(10 * sizeof(char));
char *p = dog;
unsigned long l;

memcpy(&l, p+1, sizeof(l));
//You can use l safely now.

//Copy back l to the array:
memcpy(p+1, &l, sizeof(l));
Leaseholder answered 17/6, 2018 at 18:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.