Why double in C is 8 bytes aligned?

Asked 6/6, 2012 at 11:16 Answered 28/10, 2015 at 22:42

I was reading a article about data types alignment in memory(here) and I am unable to understand one point i.e.

Note that a double variable will be allocated on 8 byte boundary on 32 bit machine and requires two memory read cycles. On a 64 bit machine, based on number of banks, double variable will be allocated on 8 byte boundary and requires only one memory read cycle.

My doubt is: Why double variables need to be allocated on 8 byte boundary and not on 4 byte? If it is allocated on 4 byte boundary still we need only 2 memory read cycles(on a 32 bit machine). Correct me if I am wrong.

Also if some one has a good tutorial on member/memory alignment, kindly share.

Adventurism answered 6/6, 2012 at 11:16 Comment(5)

See this answer: https://mcmap.net/q/667488/-why-does-__sync_add_and_fetch-work-for-a-64-bit-variable-on-a-32-bit-system – Hadwin 6/6, 2012 at 11:22

It matches cache alignment, and also SSE instruction requirements. – Armitage 6/6, 2012 at 11:40

All this depends on the hardware architecture and not on C. – Ratchford 6/6, 2012 at 13:47

@m0skit0: if everything is arch dependent then why it different for different compilers ...

A double (eight bytes) will be 8-byte aligned on Windows and 4-byte aligned on Linux (8-byte with -malign-double compile time option).

... source en.wikipedia.org/wiki/Data_structure_alignment – Adventurism 7/6, 2012 at 5:0

@OliverCharlesworth: SSE has no 8-byte-alignment-required loads/stores. It's either 16-byte alignment required for 16-byte loads/stores, or no alignment required for any narrower operands. But yes it's good for performance to make doubles 8-byte aligned so they can't split across cache lines. (Or across any other boundaries wider than 8 bytes, for CPUs that care about alignment within a cache line). – Majesty 15/10, 2019 at 5:2

The reason to align a data value of size 2^N on a boundary of 2^N is to avoid the possibility that the value will be split across a cache line boundary.

The x86-32 processor can fetch a double from any word boundary (8 byte aligned or not) in at most two, 32-bit memory reads. But if the value is split across a cache line boundary, then the time to fetch the 2nd word may be quite long because of the need to fetch a 2nd cache line from memory. This produces poor processor performance unnecessarily. (As a practical matter, the current processors don't fetch 32-bits from the memory at a time; they tend to fetch much bigger values on much wider busses to enable really high data bandwidths; the actual time to fetch both words if they are in the same cache line, and already cached, may be just 1 clock).

A free consequence of this alignment scheme is that such values also do not cross page boundaries. This avoids the possibility of a page fault in the middle of an data fetch.

So, you should align doubles on 8 byte boundaries for performance reasons. And the compilers know this and just do it for you.

Masterwork answered 6/6, 2012 at 13:35 Comment(3)

What is the problem with alignment at 4 byte boundary then? it would still need 2 cycles for 32 bit system – Occasional 23/11, 2018 at 9:6

@Raman: you aren't considering the cost of reading the 2nd 32 bits from a location that causes a cache line to be fetched from main memory. Such fetches take tens of nanoseconds, in contrast to "1 cycle" taking 0.2 ns, so its lots more than just 1 cycle. This may be rare but its pretty expensive if it happens. – Masterwork 23/11, 2018 at 9:33

The x87 FPU in CPUs as old as P5 Pentium can load 64 bits at once from cache. That's why gcc chooses to give double 8-byte alignment even with -m32, except in structs where the i386 System V ABI may force it to be misaligned. Double stack alignment question using gcc compiler for x86 architecture. All this talk of 32-bit CPUs not being able to fetch a whole double is nonsense in 2012; that's just the integer register width. That was true historically and the reason for the ABI design, though. – Majesty 15/10, 2019 at 5:22

Aligning a value on a lower boundary than its size makes it prone to be split across two cachelines. Splitting the value in two cachlines means extra work when evicting the cachelines to the backing store (two cachelines will be evicted; instead of one), which is a useless load of memory buses.

Sudderth answered 6/6, 2012 at 13:51 Comment(0)

8 byte alignment for double on 32 bit architecture doesn't reduce memory reads but it still improve performance of the system in terms of reduced cache access. Please read the following : https://mcmap.net/q/667490/-is-8-byte-alignment-for-quot-double-quot-type-necessary

Pretext answered 28/10, 2015 at 22:42 Comment(0)

-2

Refer this wiki article about double precision floating point format

The number of memory cycles depends on your hardware architecture which determines how many RAM banks you have. If you have a 32-bit architecture and 4 RAM banks, you need only 2 memory cycle to read.(Each RAM bank contributing 1 byte)

Zoology answered 6/6, 2012 at 11:47 Comment(2)

Don't understand the comment about needing only one memory cycle. First of all, "double precision" usually means 8 byte floating point numbers, secondly, 32 bit architecture normally implies a 32 bit data bus. It's impossible to get 64 bits down a 32 bit pipe in one operation no matter how you organise the RAM> – Cutting 6/6, 2012 at 13:3

There was a type error. Rephrasing again:A 32-bit machine with 4 RAM banks would access 8 bytes in 2 memory cycles. – Zoology 7/6, 2012 at 7:55

Recommended topics

Hot tags