How to determine if memory is aligned?

W

8

52

I am new to optimizing code with SSE/SSE2 instructions and until now I have not gotten very far. To my knowledge a common SSE-optimized function would look like this:

void sse_func(const float* const ptr, int len){
    if( ptr is aligned )
    {
        for( ... ){
            // unroll loop by 4 or 2 elements
        }
        for( ....){
            // handle the rest
            // (non-optimized code)
        }
    } else {
        for( ....){
            // regular C code to handle non-aligned memory
        }
    }
}

However, how do I correctly determine if the memory ptr points to is aligned by e.g. 16 Bytes? I think I have to include the regular C code path for non-aligned memory as I cannot make sure that every memory passed to this function will be aligned. And using the intrinsics to load data from unaligned memory into the SSE registers seems to be horrible slow (Even slower than regular C code).

Thank you in advance...

Waistline answered 13/12, 2009 at 23:15 Comment(4)

random-name, not sure but I think it might be more efficient to simply handle the first few 'unaligned' elements separately like you do with the last few. Then you can still use SSE for the 'middle' ones... – Ancelin 21/12, 2009 at 12:27

Hm, this is a good point. I'll try it. Thanks! – Waistline 22/12, 2009 at 16:15

Better: use a scalar prologue to handle the misaligned elements up to the first alignment boundary. (gcc does this when auto-vectorizing with a pointer of unknown alignment.) Or if your algorithm is idempotent (like a[i] = foo(b[i])), do a potentially-unaligned first vector, then the main loop starting at the first alignment boundary after the first vector, then a final vector that ends at the last element. If the array was in fact misaligned and/or the count wasn't a multiple of the vector width, then some of those vectors will overlap, but that still beats scalar. – Orsa 23/8, 2017 at 13:50

Best: supply an allocator that provides 16-byte aligned memory. Then operate on the 16-byte aligned buffer without the need to fixup leading or tail elements. This is what libraries like Botan and Crypto++ do for algorithms which use SSE, Altivec and friends. – Papert 24/8, 2018 at 14:10

C

36

EDIT: casting to long is a cheap way to protect oneself against the most likely possibility of int and pointers being different sizes nowadays.

As pointed out in the comments below, there are better solutions if you are willing to include a header...

A pointer p is aligned on a 16-byte boundary iff ((unsigned long)p & 15) == 0.

Caracaraballo answered 13/12, 2009 at 23:16 Comment(16)

I think casting a pointer to int is a bad idea? My code will be compiled on both x86 and x64 systems. I hoped there would be some secret system macro is_aligned_mem() or so. – Waistline 13/12, 2009 at 23:22

You could instead use uintptr_t - it is guaranteed the correct size to hold a pointer. Provided that your compiler defines it, of course. – Iddo 13/12, 2009 at 23:26

No, a pointer is an int. It just isn't used as a numeric generally. – Smothers 13/12, 2009 at 23:27

It doesn't really matter if the pointer and integer sizes don't match. You only care about the bottom few bits. – Myrtismyrtle 13/12, 2009 at 23:29

Well if there was a secret system macro you can be sure that it will work by casting the pointer to int. There is nothing magic going on with this cast, you are just asking the compiler to let you look at how the pointer is represented in bits. If you don't do that, how can you ever know if it is aligned ? – Housebreaking 13/12, 2009 at 23:30

I would usually use p % 16 == 0, as compilers usually know the powers of 2 just as well as I do, and I find this more readable – Splice 13/12, 2009 at 23:30

int traditionally was the size of the system word, aka a pointer. Is that changing in the 32-bit to 64-bit transition? (curious) – Smothers 13/12, 2009 at 23:33

@Splice Division/modulo over signed integers are not compiled in bitwise tricks in C99 (some stupid round-towards-zero stuff), and it's a smart compiler indeed that will recognize that the result of the modulo is being compared to zero (in which case the bitwise stuff works again). Not impossible, but not trivial. Generally speaking, better cast to unsigned integer if you want to use % and let the compiler compile &. – Caracaraballo 13/12, 2009 at 23:34

Thanks for all the answers. @Richard Pennington: That's a good point. @Bill Forster: I know someone has eventually to compare the actual bits but I wanted a safe and cross-platform (x86, x64) way. It scares me a bit that there are so many self-made solutions. And I have not found the recommended one on MSDN or at Intel's website. – Waistline 13/12, 2009 at 23:34

@Paus Nathan: It depends if you have a ILP64 or LP64 x64 system. E. g. Windows on x64 architecture is LP64, that means an int is still 32-Bit but long has 64 bits. I am not sure about Linux on x64 though. – Waistline 13/12, 2009 at 23:37

@Pascal Cuoq, gcc notices this and emits the exact same code for (p & 15) == 0 and (p % 16) == 0 with the -O flag set. I have seen a number of other compilers that recognize integer division/modulus/multiplication by a power of 2 and do the smart thing about it. (I do agree about casting to unsigned though) – Splice 13/12, 2009 at 23:43

of course, the compiler can only recognize these when dealing with a compile time constant. if you find yourself using multiple possible values, fall back to using & – Splice 13/12, 2009 at 23:50

@Splice I just compiled int d(int x) { return x / 8; } with gcc -S. It is both beautiful and sad... Mostly sad... – Caracaraballo 13/12, 2009 at 23:53

@Pascal Cuoq: I do agree about that, but it still handles the modulus and compare to 0 correctly (so long as the optimizer is being used, otherwise may emit the modulus (which it doesn't in my case, but does this far less efficiently). – Splice 14/12, 2009 at 0:8

But we can't infer the original alignment of the pointer, only the maximum alignment. i.e. ((unsigned long)p & 15) == 0 could hold true for pointers that were originally requested to be 4 or 8-byte aligned. – Entoblast 8/8, 2017 at 21:40

@Anon.: You only need to check the low bits of the pointer anyway, so it's ok to lose the high bits when casting to a narrow unsigned type. It's important to use uintptr_t if you want to cast back to a pointer after rounding down or up to the next alignment boundary, though. – Orsa 23/8, 2017 at 13:45

P

59

#define is_aligned(POINTER, BYTE_COUNT) \
    (((uintptr_t)(const void *)(POINTER)) % (BYTE_COUNT) == 0)

The cast to void * (or, equivalenty, char *) is necessary because the standard only guarantees an invertible conversion to uintptr_t for void *.

If you want type safety, consider using an inline function:

static inline _Bool is_aligned(const void *restrict pointer, size_t byte_count)
{ return (uintptr_t)pointer % byte_count == 0; }

and hope for compiler optimizations if byte_count is a compile-time constant.

Why do we need to convert to void * ?

The C language allows different representations for different pointer types, eg you could have a 64-bit void * type (the whole address space) and a 32-bit foo * type (a segment).

The conversion foo * -> void * might involve an actual computation, eg adding an offset. The standard also leaves it up to the implementation what happens when converting (arbitrary) pointers to integers, but I suspect that it is often implemented as a noop.

For such an implementation, foo * -> uintptr_t -> foo * would work, but foo * -> uintptr_t -> void * and void * -> uintptr_t -> foo * wouldn't. The alignment computation would also not work reliably because you only check alignment relative to the segment offset, which might or might not be what you want.

In conclusion: Always use void * to get implementation-independant behaviour.

Psychodynamics answered 14/12, 2009 at 1:26 Comment(6)

This macro looks really nasty and sophisticated at once. I will definitely test it. – Waistline 14/12, 2009 at 17:6

Please provide any examples you know of platforms in which non-void * does not produce an integer value in the range of uintptr_t. And/or, do you know what the rationale is for the standard to be worded that way? – Gig 25/11, 2010 at 23:7

Why restrict?, looks like it doesn't do anything when there is only one pointer? – Mcdonough 23/9, 2015 at 6:45

@Mikhail: the combination of const * with restrict is a stronger guarantee than plain const *: without restrict, it is legal to cast away the const and modify the memory; with restrict present, it is not; sadly, I learned that this isn't useful in practice as it only comes into effect if the pointer is actually used, which the caller can't assume in general (ie the usefulness lies solely on side of the callee); in this particular case, it's superfluous anyway as we're dealing with an inline function, so the compiler can see its body and infer on its own that no memory gets modified – Psychodynamics 23/9, 2015 at 16:52

If a float * can (theoretically) have a different representation from a void *, does that mean the alignment check could be happening on a different value from what was intended? – Subarid 13/3, 2019 at 21:7

@Psychodynamics Is arithmetic on uintptr_t specified? – Rutherfordium 11/5 at 19:35