C - Conversion behavior between two pointers

Asked 10/12, 2020 at 18:31 Answered 10/12, 2020 at 22:9

Solved c pointers language-lawyer strict-aliasing

Update 2020-12-11: Thanks @"Some programmer dude" for the suggestion in the comment. My underlying problem is that our team is implementing a dynamic type storage engine. We allocate multiple char array[PAGE_SIZE] buffers with 16-aligned to store dynamic types of data (there is no fixed struct). For efficiency reasons, we cannot perform byte encoding or allocate additional space to use memcpy.

Since the alignment has been determined (i.e., 16), the rest is to use the cast of pointer to access objects of the specified type, for example:

int main() {
    // simulate our 16-aligned malloc
    _Alignas(16) char buf[4096];

    // store some dynamic data:
    *((unsigned long *) buf) = 0xff07;
    *(((double *) buf) + 2) = 1.618;
}

But our team disputes whether this operation is undefined behavior.

I have read many similar questions, such as

But these are different from my interpretation of the C standard, I want to know if it’s my misunderstanding.

The main confusion is about the section 6.3.2.3 #7 of C11:

A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned 68) for the referenced type, the behavior is undefined.

68) In general, the concept ‘‘correctly aligned’’ is transitive: if a pointer to type A is correctly aligned for a pointer to type B, which in turn is correctly aligned for a pointer to type C, then a pointer to type A is correctly aligned for a pointer to type C.

Does the resulting pointer here refer to Pointer Object or Pointer Value?

In my opinion, I think the answer is the Pointer Object, but more answers seem to indicate the Pointer Value.

Interpretation A: Pointer Object

My thoughts are as follows: A pointer itself is an object. According to 6.2.5 #28, different pointer may have different representation and alignment requirements. Therefore, according to 6.3.2.3 #7, as long as two pointers have the same alignment, they can be safely converted without undefined behavior, but there is no guarantee that they can be dereferenced. Express this idea in a program:

#include <stdio.h>

int main() {
    char buf[4096];

    char *pc = buf;
    if (_Alignof(char *) == _Alignof(int *)) {
        // cast safely, because they have the same alignment requirement?
        int *pi = (int *) pc; 
        printf("pi: %p\n", pi);
    } else {
        printf("char * and int * don't have the same alignment.\n");
    }
}

Interpretation B: Pointer Value

However, if the C11 standard is talking about Pointer Value for referenced type rather than Pointer Object. The alignment check of the above code is meaningless. Express this idea in a program:

#include <stdio.h>

int main() {
    char buf[4096];

    char *pc = buf;
    
    /*
     * undefined behavior, because:
     * align of char is 1
     * align of int is 4
     * 
     * and we don't know whether the `value` of pc is 4-aligned.
     */
    int *pi = (int *) pc;
    printf("pi: %p\n", pi);
}

Which interpretation is correct?

Yowl answered 10/12, 2020 at 18:31 Comment(16)

All normal pointers are the same size. sizeof(int*) == sizeof(char*). Therefore all pointers will have the same alignment: _Alignof(int*) == _Alignof(char*) as alignment is dependent on size. – Olibanum 10/12, 2020 at 18:36

While this is a good and well-written question, I wonder about why you post it? Is it just plain curiosity (which is okay)? Is it about clarification about something you read in the specifications? Or is there some other underlying problem that leads to this question? If there's an underlying problem then please edit your question to also include that. – Photoemission 10/12, 2020 at 18:37

We also know that alignment of larger object will align with smaller objects correctly. ie a pointer to something of size 8 will be castable to something of size 4 but not the other way around. So your example above in B is wrong as the buf has size 4096 so an an int pointer will correctly be aligned. if buf was buf[2] then all bets are off. – Olibanum 10/12, 2020 at 18:39

@MartinYork Not true. The alignment of an array is the alignment of the base object type, so buf would only have an alignment requirement of 1, regardless of the size of the array. – Phenanthrene 10/12, 2020 at 18:41

[6.7.6 Alignment] Paragraph 5:

Alignments have an order from weaker to stronger or stricter alignments. Stricter alignments have larger alignment values. An address that satisfies an alignment requirement also satisfies any weaker valid alignment requirement.

– Olibanum 10/12, 2020 at 18:42

@dbush: Opps you are correct about arrays (assuming they are local like above). I was wrong in that case. But in general the observation holds. – Olibanum 10/12, 2020 at 18:44

In fact it does look ambiguous to me. I am not convinced with the answer tbh. – Bonheur 10/12, 2020 at 18:45

@MartinYork "All normal pointers are the same size." --> Object pointers and function pointers can often have differing sizes. Various object pointers can differ in size but that is rare these days. – Ragouzis 10/12, 2020 at 18:50

@chux-ReinstateMonica: That's why I said normal rather than all (as I knew there would be some lawyer trying to point out that I was not absolutely correct). In the context above we are talking about pointers to objects. – Olibanum 10/12, 2020 at 18:52

I think the the only distinction to consider is that between pointers and constant pointers. Pointers to any type will always align, while for constant pointer the alignment goes only in the weaker (see Martin's comments) direction, and also I guess that they depends on the type pointed (but I am not sure about that last statement) – Walrus 10/12, 2020 at 18:54

@EugeneSh. - if [6.7.6 Alignment] Paragraph 5 is to what you are referring, what about it seems ambiguous? – Willtrude 10/12, 2020 at 19:5

@Willtrude I am referring to the wording quoted by the OP. I used to think similarly to dbush's answer, but looking at the quoted section again I think it is not very obvious. – Bonheur 10/12, 2020 at 19:14

@Martin York: This is why I have this problem if I exclude the function pointers and make sure that the object pointers are the same size and alignment. In interpretation A, I can safely convert between any object pointers as a method of locating address. – Yowl 10/12, 2020 at 19:30

@RichardBryant: No. I was poitning out the check is meaningless (as it is always true). char* x = 0x01; Valid. int* y = (int*)x; Not valid. The pointers x/y have the same alignment and you can assign to them. But assigning an illegal value is UB. Interpretation B is correct. – Olibanum 10/12, 2020 at 19:59

Thank you all for correcting my misunderstanding, I got it! – Yowl 10/12, 2020 at 20:7

@RichardBryant See my edit. There are other issues at play. – Phenanthrene 10/12, 2020 at 21:11

Interpretation B is correct. The standard is talking about a pointer to an object, not the object itself. "Resulting pointer" is referring to the result of the cast, and a cast does not produce an lvalue, so it's referring to the pointer value after the cast.

Taking the code in your example, suppose that an int must be aligned on a 4 byte boundary, i.e. it's address must be a multiple of 4. If the address of buf is 0x1001 then converting that address to int * is invalid because the pointer value is not properly aligned. If the address of buf is 0x1000 then converting it to int * is valid.

Update:

The code you added addresses the alignment issue, so it's fine in that regard. It however has a different issue: it violates strict aliasing.

The array you defined contains objects of type char. By casting the address to a different type and subsequently dereferencing the converted type type, you're accessing objects of one type as objects of another type. This is not allowed by the C standard.

Though the term "strict aliasing" is not used in the standard, the concept is described in section 6.5 paragraphs 6 and 7:

6 The effective type of an object for an access to its stored value is the declared type of the object, if any.⁸⁷⁾ If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value. If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one. For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.

7 An object shall have its stored value accessed only by an lvalue expression that has one of the following types:⁸⁸⁾

a type compatible with the effective type of the object,

a qualified version of a type compatible with the effective type of the object,

a type that is the signed or unsigned type corresponding to the effective type of the object,

a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,

an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or

a character type.

...

87 ) Allocated objects have no declared type.

88 ) The intent of this list is to specify those circumstances in which an object may or may not be aliased.

In your example, you're writing an unsigned long and a double on top of char objects. Neither of these types satisfies the conditions of paragraph 7.

In addition to that, the pointer arithmetic here is not valid:

 *(((double *) buf) + 2) = 1.618;

As you're treating buf as an array of double when it is not. At the very least, you would need to perform the necessary arithmetic on buf directly and cast the result at the end.

So why is this a problem for a char array and not a buffer returned by malloc? Because memory returned from malloc has no effective type until you store something in it, which is what paragraph 6 and footnote 87 describe.

So from a strict point of view of the standard, what you're doing is undefined behavior. But depending on your compiler you may be able to disable strict aliasing so this will work. If you're using gcc, you'll want to pass the -fno-strict-aliasing flag

Phenanthrene answered 10/12, 2020 at 18:36 Comment(3)

Thanks for updating! I understand basic strict aliasing, so I comment simulate our 16-aligned malloc in my code to point out that it is Allocated Objects. But I would like to ask, once the allocated object has an effective type, is it illegal to convert it to another type (we want to reuse the buffer when we don’t need the old data)? – Yowl 10/12, 2020 at 23:50

@RichardBryant Based on paragraph 6 I believe it's acceptable. The key phrase is that once a value is stored, "the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value". So storing a new value can change the type. – Phenanthrene 11/12, 2020 at 0:2

@dbush: Storing a new value (a bit pattern that storage has never held) would change the Effective Type, but in the -fstrict-aliasing dialect processed by clang and gcc, if a region of storage has used to hold a T1 with a certain bit pattern, writing a T2 with that bit pattern may cause the Effective Type to be set, at the compiler's leisure, to either T1 or T2. – Urbai 12/12, 2020 at 18:6

The Standard does not require that implementations consider the possibility that code will ever observe a value in a T* that is not aligned for type T. In clang, for example, when targeting platforms whose "larger" load/store instructions do not support unaligned access, converting a pointer into a type whose alignment it doesn't satisfy and then using memcpy on it may result in the compiler generating code which will fail if the pointer isn't aligned, even though memcpy itself would not otherwise impose any alignment requirements.

When targeting an ARM Cortex-M0 or Cortex-M3, for example, given:

void test1(long long *dest, long long *src)
{
    memcpy(dest, src, sizeof (long long));
}
void test2(char *dest, char *src)
{
    memcpy(dest, src, sizeof (long long));
}
void test3(long long *dest, long long *src)
{
    *dest = *src;
}

clang will generate for both test1 and test3 code which would fail if src or dest were not aligned, but for test2 it will generate code which is bigger and slower, but which will support arbitrary alignment of the source and destination operands.

To be sure, even on clang the act of converting an unaligned pointer into a long long* won't generally cause anything weird to happen by itself, but it is the fact that such a conversion would produce UB that exempts the compiler of any responsibility to handle the unaligned-pointer case in test1.

Urbai answered 10/12, 2020 at 22:9 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Interpretation A: Pointer Object

Interpretation B: Pointer Value

Recommended topics

Hot tags