64 bit architecture like x86-64 have word size of 64bits. In this case, if a memory access crosses over the word boundary, then it will require double the time to access data. So alignment is required. - This is what I know. Correct me if I am wrong.
Now, GCC uses 16 byte alignment (msvc atleast uses 8 byte alignment) for long double
whose non-padding size is 10 bytes. But anyways, with 8 byte alignment it requires 2 read cycles and it is the same case with 16 byte alignment. So why stricter 16 byte alignment? What is the purpose of alignment other than that I mentioned above?
Also, in fact, since the non-padding part of long double
(the 80-bit x87 extended FP) is 10 bytes, actually 4 byte alignment is sufficient for that. In this case also, it can read data within 2 read cycles (either 4-6 or 8-2). So, also explain where this assumption has gone wrong.
(The actual sizeof(long double)
is 12 in the i386 System V ABI, 16 in x86-64 System V. Multiples of their respective alignof()
of 4 and 16)
fld m80
does decode into a 2-byte and an 8-byte load, so 8 or 16 byte alignment are both sufficient to avoid cache-line splits in either of the halves. But only if the size is 16 bytes, so you might as well make the align match the size for SSE copying. – Muckylong double
, but it does not explain why or what the impact is. My quick experiments showed no impact of (mis)alignment, only of crossing cache line boundaries, as expected. – Wicklow