How to determine if memory is aligned?
Asked Answered
W

8

52

I am new to optimizing code with SSE/SSE2 instructions and until now I have not gotten very far. To my knowledge a common SSE-optimized function would look like this:

void sse_func(const float* const ptr, int len){
    if( ptr is aligned )
    {
        for( ... ){
            // unroll loop by 4 or 2 elements
        }
        for( ....){
            // handle the rest
            // (non-optimized code)
        }
    } else {
        for( ....){
            // regular C code to handle non-aligned memory
        }
    }
}

However, how do I correctly determine if the memory ptr points to is aligned by e.g. 16 Bytes? I think I have to include the regular C code path for non-aligned memory as I cannot make sure that every memory passed to this function will be aligned. And using the intrinsics to load data from unaligned memory into the SSE registers seems to be horrible slow (Even slower than regular C code).

Thank you in advance...

Waistline answered 13/12, 2009 at 23:15 Comment(4)
random-name, not sure but I think it might be more efficient to simply handle the first few 'unaligned' elements separately like you do with the last few. Then you can still use SSE for the 'middle' ones...Ancelin
Hm, this is a good point. I'll try it. Thanks!Waistline
Better: use a scalar prologue to handle the misaligned elements up to the first alignment boundary. (gcc does this when auto-vectorizing with a pointer of unknown alignment.) Or if your algorithm is idempotent (like a[i] = foo(b[i])), do a potentially-unaligned first vector, then the main loop starting at the first alignment boundary after the first vector, then a final vector that ends at the last element. If the array was in fact misaligned and/or the count wasn't a multiple of the vector width, then some of those vectors will overlap, but that still beats scalar.Orsa
Best: supply an allocator that provides 16-byte aligned memory. Then operate on the 16-byte aligned buffer without the need to fixup leading or tail elements. This is what libraries like Botan and Crypto++ do for algorithms which use SSE, Altivec and friends.Papert
C
36

EDIT: casting to long is a cheap way to protect oneself against the most likely possibility of int and pointers being different sizes nowadays.

As pointed out in the comments below, there are better solutions if you are willing to include a header...

A pointer p is aligned on a 16-byte boundary iff ((unsigned long)p & 15) == 0.

Caracaraballo answered 13/12, 2009 at 23:16 Comment(16)
I think casting a pointer to int is a bad idea? My code will be compiled on both x86 and x64 systems. I hoped there would be some secret system macro is_aligned_mem() or so.Waistline
You could instead use uintptr_t - it is guaranteed the correct size to hold a pointer. Provided that your compiler defines it, of course.Iddo
No, a pointer is an int. It just isn't used as a numeric generally.Smothers
It doesn't really matter if the pointer and integer sizes don't match. You only care about the bottom few bits.Myrtismyrtle
Well if there was a secret system macro you can be sure that it will work by casting the pointer to int. There is nothing magic going on with this cast, you are just asking the compiler to let you look at how the pointer is represented in bits. If you don't do that, how can you ever know if it is aligned ?Housebreaking
I would usually use p % 16 == 0, as compilers usually know the powers of 2 just as well as I do, and I find this more readableSplice
int traditionally was the size of the system word, aka a pointer. Is that changing in the 32-bit to 64-bit transition? (curious)Smothers
@Splice Division/modulo over signed integers are not compiled in bitwise tricks in C99 (some stupid round-towards-zero stuff), and it's a smart compiler indeed that will recognize that the result of the modulo is being compared to zero (in which case the bitwise stuff works again). Not impossible, but not trivial. Generally speaking, better cast to unsigned integer if you want to use % and let the compiler compile &.Caracaraballo
Thanks for all the answers. @Richard Pennington: That's a good point. @Bill Forster: I know someone has eventually to compare the actual bits but I wanted a safe and cross-platform (x86, x64) way. It scares me a bit that there are so many self-made solutions. And I have not found the recommended one on MSDN or at Intel's website.Waistline
@Paus Nathan: It depends if you have a ILP64 or LP64 x64 system. E. g. Windows on x64 architecture is LP64, that means an int is still 32-Bit but long has 64 bits. I am not sure about Linux on x64 though.Waistline
@Pascal Cuoq, gcc notices this and emits the exact same code for (p & 15) == 0 and (p % 16) == 0 with the -O flag set. I have seen a number of other compilers that recognize integer division/modulus/multiplication by a power of 2 and do the smart thing about it. (I do agree about casting to unsigned though)Splice
of course, the compiler can only recognize these when dealing with a compile time constant. if you find yourself using multiple possible values, fall back to using &Splice
@Splice I just compiled int d(int x) { return x / 8; } with gcc -S. It is both beautiful and sad... Mostly sad...Caracaraballo
@Pascal Cuoq: I do agree about that, but it still handles the modulus and compare to 0 correctly (so long as the optimizer is being used, otherwise may emit the modulus (which it doesn't in my case, but does this far less efficiently).Splice
But we can't infer the original alignment of the pointer, only the maximum alignment. i.e. ((unsigned long)p & 15) == 0 could hold true for pointers that were originally requested to be 4 or 8-byte aligned.Entoblast
@Anon.: You only need to check the low bits of the pointer anyway, so it's ok to lose the high bits when casting to a narrow unsigned type. It's important to use uintptr_t if you want to cast back to a pointer after rounding down or up to the next alignment boundary, though.Orsa
P
59
#define is_aligned(POINTER, BYTE_COUNT) \
    (((uintptr_t)(const void *)(POINTER)) % (BYTE_COUNT) == 0)

The cast to void * (or, equivalenty, char *) is necessary because the standard only guarantees an invertible conversion to uintptr_t for void *.

If you want type safety, consider using an inline function:

static inline _Bool is_aligned(const void *restrict pointer, size_t byte_count)
{ return (uintptr_t)pointer % byte_count == 0; }

and hope for compiler optimizations if byte_count is a compile-time constant.

Why do we need to convert to void * ?

The C language allows different representations for different pointer types, eg you could have a 64-bit void * type (the whole address space) and a 32-bit foo * type (a segment).

The conversion foo * -> void * might involve an actual computation, eg adding an offset. The standard also leaves it up to the implementation what happens when converting (arbitrary) pointers to integers, but I suspect that it is often implemented as a noop.

For such an implementation, foo * -> uintptr_t -> foo * would work, but foo * -> uintptr_t -> void * and void * -> uintptr_t -> foo * wouldn't. The alignment computation would also not work reliably because you only check alignment relative to the segment offset, which might or might not be what you want.

In conclusion: Always use void * to get implementation-independant behaviour.

Psychodynamics answered 14/12, 2009 at 1:26 Comment(6)
This macro looks really nasty and sophisticated at once. I will definitely test it.Waistline
Please provide any examples you know of platforms in which non-void * does not produce an integer value in the range of uintptr_t. And/or, do you know what the rationale is for the standard to be worded that way?Gig
Why restrict?, looks like it doesn't do anything when there is only one pointer?Mcdonough
@Mikhail: the combination of const * with restrict is a stronger guarantee than plain const *: without restrict, it is legal to cast away the const and modify the memory; with restrict present, it is not; sadly, I learned that this isn't useful in practice as it only comes into effect if the pointer is actually used, which the caller can't assume in general (ie the usefulness lies solely on side of the callee); in this particular case, it's superfluous anyway as we're dealing with an inline function, so the compiler can see its body and infer on its own that no memory gets modifiedPsychodynamics
If a float * can (theoretically) have a different representation from a void *, does that mean the alignment check could be happening on a different value from what was intended?Subarid
@Psychodynamics Is arithmetic on uintptr_t specified?Rutherfordium
C
36

EDIT: casting to long is a cheap way to protect oneself against the most likely possibility of int and pointers being different sizes nowadays.

As pointed out in the comments below, there are better solutions if you are willing to include a header...

A pointer p is aligned on a 16-byte boundary iff ((unsigned long)p & 15) == 0.

Caracaraballo answered 13/12, 2009 at 23:16 Comment(16)
I think casting a pointer to int is a bad idea? My code will be compiled on both x86 and x64 systems. I hoped there would be some secret system macro is_aligned_mem() or so.Waistline
You could instead use uintptr_t - it is guaranteed the correct size to hold a pointer. Provided that your compiler defines it, of course.Iddo
No, a pointer is an int. It just isn't used as a numeric generally.Smothers
It doesn't really matter if the pointer and integer sizes don't match. You only care about the bottom few bits.Myrtismyrtle
Well if there was a secret system macro you can be sure that it will work by casting the pointer to int. There is nothing magic going on with this cast, you are just asking the compiler to let you look at how the pointer is represented in bits. If you don't do that, how can you ever know if it is aligned ?Housebreaking
I would usually use p % 16 == 0, as compilers usually know the powers of 2 just as well as I do, and I find this more readableSplice
int traditionally was the size of the system word, aka a pointer. Is that changing in the 32-bit to 64-bit transition? (curious)Smothers
@Splice Division/modulo over signed integers are not compiled in bitwise tricks in C99 (some stupid round-towards-zero stuff), and it's a smart compiler indeed that will recognize that the result of the modulo is being compared to zero (in which case the bitwise stuff works again). Not impossible, but not trivial. Generally speaking, better cast to unsigned integer if you want to use % and let the compiler compile &.Caracaraballo
Thanks for all the answers. @Richard Pennington: That's a good point. @Bill Forster: I know someone has eventually to compare the actual bits but I wanted a safe and cross-platform (x86, x64) way. It scares me a bit that there are so many self-made solutions. And I have not found the recommended one on MSDN or at Intel's website.Waistline
@Paus Nathan: It depends if you have a ILP64 or LP64 x64 system. E. g. Windows on x64 architecture is LP64, that means an int is still 32-Bit but long has 64 bits. I am not sure about Linux on x64 though.Waistline
@Pascal Cuoq, gcc notices this and emits the exact same code for (p & 15) == 0 and (p % 16) == 0 with the -O flag set. I have seen a number of other compilers that recognize integer division/modulus/multiplication by a power of 2 and do the smart thing about it. (I do agree about casting to unsigned though)Splice
of course, the compiler can only recognize these when dealing with a compile time constant. if you find yourself using multiple possible values, fall back to using &Splice
@Splice I just compiled int d(int x) { return x / 8; } with gcc -S. It is both beautiful and sad... Mostly sad...Caracaraballo
@Pascal Cuoq: I do agree about that, but it still handles the modulus and compare to 0 correctly (so long as the optimizer is being used, otherwise may emit the modulus (which it doesn't in my case, but does this far less efficiently).Splice
But we can't infer the original alignment of the pointer, only the maximum alignment. i.e. ((unsigned long)p & 15) == 0 could hold true for pointers that were originally requested to be 4 or 8-byte aligned.Entoblast
@Anon.: You only need to check the low bits of the pointer anyway, so it's ok to lose the high bits when casting to a narrow unsigned type. It's important to use uintptr_t if you want to cast back to a pointer after rounding down or up to the next alignment boundary, though.Orsa
G
25

Other answers suggest an AND operation with low bits set, and comparing to zero.

But a more straight-forward test would be to do a MOD with the desired alignment value, and compare to zero.

#define ALIGNMENT_VALUE     16u

if (((uintptr_t)ptr % ALIGNMENT_VALUE) == 0)
{
    // ptr is aligned
}
Gig answered 13/12, 2009 at 23:27 Comment(6)
I upvoted you, but only because you are using unsigned integers :)Caracaraballo
I believe this fails with uint8_t types, which sometimes have alignment requirements of 1.Papert
@Papert I'm not sure I understand what you mean. An alignment requirement of 1 would mean essentially no alignment requirement. There's no need to worry about alignment of uint8_t. But please clarify if I'm misunderstanding.Gig
Does 16u provide a portability advantage that 16 does not?Laconism
The u suffix on the integer makes it unsigned. It's good to avoid mixing signed and unsigned in expressions, to avoid some possible gotchas that can happen with mixed-sign arithmetic. See GCC warning "comparison between signed and unsigned integer expressions". It probably doesn't matter in this case, but it's good to get into good habits. (I suppose the 0 should be 0u too)Gig
Take note that you shouldn't use a real MOD operation, it's quite an expensive operation and should be avoided as much as possible. You should always use the and operation. But I believe if you have an enough sophisticated compiler with all the optimization options enabled it'll automatically convert your MOD operation to a single and opcode. (Linux kernel uses and operation too fyi)Torrance
S
12

With a function template like

#include <type_traits>

template< typename T >
bool is_aligned(T* p){
    return !(reinterpret_cast<uintptr_t>(p) % std::alignment_of<T>::value);
}

you could check alignment at runtime by invoking something like

struct foo_type{ int bar; }foo;
assert(is_aligned(&foo)); // passes

To check that bad alignments fail, you could do

// would almost certainly fail
assert(is_aligned((foo_type*)(1 + (uintptr_t)(&foo)));
Sheetfed answered 23/2, 2015 at 16:37 Comment(8)
It would be good here to explain how this works so the OP understands it.Hulse
C++ explicitly forbids creating unaligned pointers to given type T. Because such pointer is not allowed to exist the compiler is allowed to optimize is_aligned(p) to true for any pointer p.Vermiculate
@paweł-bylica, you're probably correct. Could you provide a reference (document, chapter, verse, etc.) so I can amend my answer?Sheetfed
Also template functions are always inline, so the inline keyword is redundant.Woll
@gnzlbg, I don't think function templates are always inline; at least not according to this: https://mcmap.net/q/16549/-does-it-make-any-sense-to-use-inline-keyword-with-templates.Sheetfed
That answer says that inline makes a difference on explicit specializations, but explicit specializations are not templates. The second answer on that page is correct: https://mcmap.net/q/16549/-does-it-make-any-sense-to-use-inline-keyword-with-templates Basically, if you were to explicitly specialize this template into a function, then, depending on where you decide to specialize it (e.g. a header file), you might need to use the inline keyword on the specialization to avoid ODR issues, but this is always the case independently of whether you use inline on the template or not. inline on the template is completely irrelevant.Woll
@gnzlbg, I concede; you are correct. I'll change my answer forthwith.Sheetfed
why didnt you cast to a void* or char* ? Timur Dumler specifically talked about this in his undefined behaviour, Why you cant type pun in cpp lecture. Even though, I dont see how compiler implements this, or why the compiler would even care to ensure that a runtime value is not proper alignment.Sericin
I
6

This is basically what I'm using. By making the integer a template, I ensure it's expanded compile time, so I won't end up with a slow modulo operation whatever I do.

I always like checking my input, so hence the compile time assertion. If your alignment value is wrong, well then it won't compile...

template <unsigned int alignment>
struct IsAligned
{
    static_assert((alignment & (alignment - 1)) == 0, "Alignment must be a power of 2");

    static inline bool Value(const void * ptr)
    {
        return (((uintptr_t)ptr) & (alignment - 1)) == 0;
    }
};

To see what's going on, you can use this:

// 1 of them is aligned...
int* ptr = new int[8];
for (int i = 0; i < 8; ++i)
    std::cout << IsAligned<32>::Value(ptr + i) << std::endl;

// Should give '1'
int* ptr2 = (int*)_aligned_malloc(32, 32);
std::cout << IsAligned<32>::Value(ptr2) << std::endl;
Inconsequential answered 27/2, 2015 at 8:3 Comment(0)
L
6

Leave that to the professionals,

https://www.boost.org/doc/libs/1_65_1/doc/html/align/reference.html#align.reference.functions.is_aligned

bool is_aligned(const void* ptr, std::size_t alignment) noexcept; 

example:

        char D[1];
        assert( boost::alignment::is_aligned(&D[0], alignof(double)) ); //  might fail, sometimes
Lordinwaiting answered 8/7, 2019 at 20:10 Comment(0)
I
2

Can you just 'and' the ptr with 0x03 (aligned on 4s), 0x07 (aligned on 8s) or 0x0f (aligned on 16s) to see if any of the lowest bits are set?

Inaction answered 13/12, 2009 at 23:17 Comment(3)
No, you can't. A pointer is not a valid argument to the & operator.Attainder
@SteveJessop you could cast to uintptr_t.Housemother
@MarkYisri: yes, I expect that in practice, every implementation that supports SSE2 instructions provides an implementation-specific guarantee that'll work :-)Attainder
T
-3

How about:

void *mem = malloc(1024+15); 
void *ptr =( (*(char*)mem) - (*(char *)mem % 16) );
Towardly answered 4/9, 2012 at 8:52 Comment(5)
-1 Doesn't answer the question. (the question was "How to determine if memory is aligned?", not "how to allocate some aligned memory?")Meehan
@Meehan he does align it in the second lineHousemother
@MarkYisri It's also not "how to align a buffer?"Meehan
@Meehan doesn't matter whether it's a buffer or not. mem is a pointer.Housemother
@MarkYisri It's also not "how to align a pointer?". The answer to "is mem aligned?" is not a pointer. It's "yes" or "no".Meehan

© 2022 - 2024 — McMap. All rights reserved.