how does malloc understand alignment?
Asked Answered
Y

7

76

following excerpted from here

pw = (widget *)malloc(sizeof(widget));

allocates raw storage. Indeed, the malloc call allocates storage that's big enough and suitably aligned to hold an object of type widget

also see fast pImpl from herb sutter, he said:

Alignment. Any memory Alignment. Any memory that's allocated dynamically via new or malloc is guaranteed to be properly aligned for objects of any type, but buffers that are not allocated dynamically have no such guarantee

I am curious about this, how does malloc know alignment of the custom type?

Yasukoyataghan answered 6/1, 2012 at 2:5 Comment(1)
new and malloc, by default, align address to 8 bytes (x86) or 16 bytes (x64), which is the optimal for most complex data. Also is sizeof() duty to get the correct size struct with internal padding for alignment, if necessary.Gilbertson
M
66

Alignment requirements are recursive: The alignment of any struct is simply the largest alignment of any of its members, and this is understood recursively.

For example, and assuming that each fundamental type's alignment equals its size (this is not always true in general), the struct X { int; char; double; } has the alignment of double, and it will be padded to be a multiple of the size of double (e.g. 4 (int), 1 (char), 3 (padding), 8 (double)). The struct Y { int; X; float; } has the alignment of X, which is the largest and equal to the alignment of double, and Y is laid out accordingly: 4 (int), 4 (padding), 16 (X), 4 (float), 4 (padding).

(All numbers are just examples and could differ on your machine.)

Therefore, by breaking it down to the fundamental types, we only need to know a handful of fundamental alignments, and among those there is a well-known largest. C++ even defines a type max_align_t whose alignment is that largest alignment.

All malloc() needs to do is to pick an address that's a multiple of that value.

Misjudge answered 6/1, 2012 at 2:34 Comment(11)
The key thing to point out is that this doesn't include custom align directives to the compiler that might over-align data.Hedge
Although if you use these you are already outside the scope of the standard, please note that memory allocated in this way probably won't meet the alignment requirements for built types such as _m256 that are available as extensions on some platforms.Libelee
What happens when you specify a custom alignment via alignas that is larger than the largest alignment of a primitive datatype?Haugh
@Curious: Support for extended alignment is implementation-defined.Misjudge
std::max_align_t is largest alignment of scalar types, so a struct or a class can potentially have stricter alignment requirement than std::max_align_t.Tildatilde
@MikhailVasilyev: Yes, but only if given an alignas, right? Otherwise a UDT's alignment is just made up recursively of the alignment of the members.Misjudge
@KerrekSB Yes, but some widely-used classes from the standard library might turn out to be over-aligned on some platforms. F.ex. see this issue. Also virtual classes include a pointer to a virtual table so their alignment is determined not only by alignment of the members.Tildatilde
malloc has no information on the type it is allocating for; the only parameter is the size of the allocated memory. The man page states it correctly: the allocated memory is aligned such that it can be used for for any data types, i.e. the alignment is the same for all types.Constantina
@Massimo: I'm not sure there's a contradicton here. I first explained how alignment of a type is defined, and then how malloc can return fundamentally aligned memory. I never said that malloc knows about alignment of any one type. As long as your types don't have extended alignment, malloc gives you suitably aligned memory.Misjudge
@KerrekSB You talk about struct alignment and you conclude with All malloc() needs to do is to pick an address that's a multiple of that value. Show us how do yo pass to malloc() the struct alignment...P
@Massimo: I think the "that" refers to the "largest alignment" from the previous paragraph, which is exactly the alignment guarantee malloc provides, isn't it?Misjudge
L
32

I think the most relevant part of the Herb Sutter quote is the part I've marked in bold:

Alignment. Any memory Alignment. Any memory that's allocated dynamically via new or malloc is guaranteed to be properly aligned for objects of any type, but buffers that are not allocated dynamically have no such guarantee

It doesn't have to know what type you have in mind, because it's aligning for any type. On any given system, there's a maximum alignment size that's ever necessary or meaningful; for example, a system with four-byte words will likely have a maximum of four-byte alignment.

This is also made clear by the malloc(3) man-page, which says in part:

The malloc() and calloc() functions return a pointer to the allocated memory that is suitably aligned for any kind of variable.

Lettuce answered 6/1, 2012 at 2:13 Comment(10)
what is meaning of any kind of variable? it does't answer my question. does it mean malloc will always use maximum alignment size in any given system, right?Yasukoyataghan
@Chang: effectively, yes. Also note, the quote is wrong. new is only guaranteed to have "any" alignment when allocating char or unsigned char. For others, it may have a smaller alignment.Squamosal
@Chang: Right, the maximum alignment size. "Suitably aligned for any kind of variable" means "suitably aligned for an int and suitably aligned for a pointer and suitably aligned for any struct and . . .".Lettuce
@MooingDuck: new char[16] does not guarantee any alignment at all. (In general, new T[n] returns a pointer aligned for any type X where sizeof(X)<=sizeof(T).)Irbm
@aschepler: That's not true. See the C++11 spec, section 5.3.4, clause 10; new char[16] is specified in a way that's assumed to guarantee that it's suitably aligned for any type X where sizeof(X)<=16.Lettuce
@aschepler: No, new T[n] is only aligned for type T, unless T is (possibly signed/unsigned) char, then it's aligned for any type X where sizeof (X) <= n.Drye
Um, yes. I somehow got several very wrong ideas from looking at the very same Standard paragraph. Reading comprehension fail.Irbm
@BenVoigt: I think the "magic alignment" is only for char and unsigned char, but NOT for signed char. The C++ spec treats char and unsinged char as "byte" types, but does not cnosider signed char a "byte" type. (Implicitly, the spec doesn't actually say "byte types" as such.)Squamosal
@MooingDuck: Looks like you're right, but I think that may be a defect in the Standard, since the accompanying note talks about generic character arrays which include all three: "this constraint on array allocation overhead permits the common idiom of allocating character arrays into which objects of other types will later be placed" (And that note in turn should probably say character sequences, since character arrays include wide character types as well)Drye
@BenVoigt: I would posit that was the defect, since unsigned char is used in byte-like ways in §3.8/5-6 §3.9/2-4, §3.10/10, and §5.3.4/10, and in none of those is signed char mentioned or implied. Also, §3.9/1 calls out unsinged char specifically with "all possible bit patterns of the value representation represent numbers".Squamosal
C
6

The only information that malloc() can use is the size of the request passed to it. In general, it might do something like round up the passed size to the nearest greater (or equal) power of two, and align the memory based on that value. There would likely also be an upper bound on the alignment value, such as 8 bytes.

The above is a hypothetical discussion, and the actual implementation depends on the machine architecture and runtime library that you're using. Maybe your malloc() always returns blocks aligned on 8 bytes and it never has to do anything different.

Charmainecharmane answered 6/1, 2012 at 2:10 Comment(5)
In summary then, malloc uses the 'worst case' alignment because it doesn't know any better. Does that mean that calloc can be smarter because it takes two args, the number of objects and the size of a single object?Larrisa
Maybe. Maybe not. You'd have to look at your runtime library source to find out.Charmainecharmane
-1, sorry. Your answer includes the truth, but it also includes disinformation. It's not a "maybe, maybe not" thing; it's specifically documented to work in a way that doesn't depend on the size. (Dunno why not, though. It seems like it would make perfect sense for it to do so.)Lettuce
The answer to my own question is No. I found this: "The malloc() and calloc() functions return a pointer to the allocated memory that is suitably aligned for any kind of variable." Seem like the memalign function is potentially useful though: wwwcgi.rdg.ac.uk:8081/cgi-bin/cgiwrap/wsi14/poplog/man/3C/…Larrisa
see ruakh's reply, so malloc will always use maximum alignment size in any given system, right?Yasukoyataghan
W
3

1) Align to the least common multiple of all alignments. e.g. if ints require 4 byte alignment, but pointers require 8, then allocate everything to 8 byte alignment. This causes everything to be aligned.

2) Use the size argument to determine correct alignment. For small sizes you can infer the type, such as malloc(1) (assuming other types sizes are not 1) is always a char. C++ new has the benefit of being type safe and so can always make alignment decisions this way.

Whacking answered 6/1, 2012 at 2:18 Comment(2)
Can you expand the acronym LCM? I can guess, but I shouldn't have to.Squamosal
Also, there are other types in C++ that can be 1 byte. However, your implication is correct, it can still align based of the size of the type.Squamosal
B
2

Previous to C++11 alignment was treated fairly simple by using the largest alignment where exact value was unknown and malloc/calloc still work this way. This means malloc allocation is correctly aligned for any type.

Wrong alignment may result in undefined behavior according to the standard but I have seen x86 compilers being generous and only punishing with lower performance.

Note that you also can tweak alignment via compiler options or directives. (pragma pack for VisualStudio for example).

But when it comes to placement new, then C++11 brings us new keywords called alignof and alignas. Here is some code which shows the effect if compiler max alignment is greater then 1. The first placement new below is automatically good but not the second.

#include <iostream>
#include <malloc.h>
using namespace std;
int main()
{
        struct A { char c; };
        struct B { int i; char c; };

        unsigned char * buffer = (unsigned char *)malloc(1000000);
        long mp = (long)buffer;

        // First placment new
        long alignofA = alignof(A) - 1;
        cout << "alignment of A: " << std::hex << (alignofA + 1) << endl;
        cout << "placement address before alignment: " << std::hex << mp << endl;
        if (mp&alignofA)
        {
            mp |= alignofA;
            ++mp;
        }
        cout << "placement address after alignment : " << std::hex <<mp << endl;
        A * a = new((unsigned char *)mp)A;
        mp += sizeof(A);

        // Second placment new
        long alignofB = alignof(B) - 1;
        cout << "alignment of B: " <<  std::hex << (alignofB + 1) << endl;
        cout << "placement address before alignment: " << std::hex << mp << endl;
        if (mp&alignofB)
        {
            mp |= alignofB;
            ++mp;
        }
        cout << "placement address after alignment : " << std::hex << mp << endl;
        B * b = new((unsigned char *)mp)B;
        mp += sizeof(B);
}

I guess performance of this code can be improved with some bitwise operations.

EDIT: Replaced expensive modulo computation with bitwise operations. Still hoping that somebody finds something even faster.

Bistre answered 28/8, 2013 at 4:42 Comment(4)
It's not actually the compiler, it's the hardware itself. On x86 a misaligned memory access simply forces the processor to fetch the two sides of the memory boundary and piece the result together, so it's always "correct" if slower. On e.g. some ARM processors, you would get a bus error and a program crash. This is a bit of a problem because many programmers are never exposed to anything else than x86, and so may not know that the behaviour is actually undefined instead of merely decreasing performance.Cheeks
You are correct, its the hardware or cpu-microcode software but not the actual compiler that saves you on the x86 architecture. I really wonder why there is no more convenient api to handle this. As if C/C++ designers wanted developers to step into the trap. Reminds me of std::numeric_limits<double>::min() trap. Anyone got that one right the first time?Bistre
Well, once you know what is going on, it's not too hard to change your programming style from all sorts of crazy type-punning to well-typed code, fortunately. The C type system makes it fairly easy to preserve type alignment as long as you don't go doing insane bit manipulation stuff without paying attention. Now pointer-aliasing-free code on the other hand has some much tougher semantics...Cheeks
I do not understand. You have the problem whenever you have your own little heap that you manage yourself. What use of placement new are you thinking about in your comment?Bistre
B
1

malloc has no knowledge of what it is allocating for because its parameter is just total size. It just aligns to an alignment that is safe for any object.

Basenji answered 8/9, 2018 at 20:52 Comment(0)
B
1

You might find out the allocation bits for your malloc()-implementation with this small C-program:

#include <stdlib.h>
#include <stdio.h>

int main()
{
    size_t
        find = 0,
        size;
    for( unsigned i = 1000000; i--; )
        if( size = rand() & 127 )
            find |= (size_t)malloc( size );
    char bits = 0;
    for( ; !(find & 1); find >>= 1, ++bits );
    printf( "%d", (int)bits );
}
Brant answered 25/5, 2022 at 14:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.