What does it mean that void* has the same representation and memory alignment as char*?
Asked Answered
B

3

5

I've been reading some articles about void* type pointers and found this requirement from the Standard.

6.2.5.27:

A pointer to void shall have the same representation and alignment requirements as a pointer to a character type.39) Similarly, pointers to qualified or unqualified versions of compatible types shall have the same representation and alignment requirements.

I see that the Standard does not guarantee all pointer types have the same length, so the bottom line here is that a void* pointer has the same length and alignment rules as char*, right?

What I don't get is the footnote 39), which says

The same representation and alignment requirements are meant to imply interchangeability as arguments to functions, return values from functions, and members of unions.

My questions are:

  1. What does it mean by "interchangeability"? Does it say the argument and the return values of a function void* Func(void*) can both be char*?

  2. If so, is it an implicit conversion made by the compiler?

  3. And what is it about the members of unions? I really don't get a grasp of the meaning of this. Can anyone give me a simple example?

Bohn answered 10/1, 2021 at 9:17 Comment(5)
Without going though the details, it means a pointer is a pointer is a pointer. In C any type pointer can be assigned to void* and back without a cast. That is the basis for the interchangeability.No implicit conversion takes place, a void* pointer is simply a pointer without a specific type. And, since type controls pointer arithmetic, you can't do pointer arithmetic on void* pointers. Just like you cannot dereference a void* pointer because the type information is missing -- it would be an incomplete type. So you have to assign or cast a void pointer before dreferencing.Bounteous
@DavidC.Rankin Those are all true statements, but I don't think they're what the quoted passage is about. It's about void* and char* having the same representation. void* doesn't have to have the same representation or alignment as other pointer types.Teeming
I take that to mean they both are the same size and share the same alignment requirements. (this also dovetails with the strict aliasing rule -type compatible and char* exception) I haven't done a critical interpretation of the standard section, just more a practical discussion of how that section doesn't hold any hidden gotchas.Bounteous
I have generally regarded this as a the standard saying that these interchangeable types are “compatible” for the purposes of satisfying function call requirements but giving up on working it into the official language of the specification. For example, you could define void *foo(void *p) in one translation unit and declare char *foo(char *p) in another, and call it using the latter, and it would work because the types are interchangeable. But it is undefined according to the normative text of the C standard, aside from this passage in C 2018 6.2.5 28 about the same representation.Foghorn
The question cites the passage as 6.2.5.27, which I presume means clause 6.2.5 paragraph 27 (you should not use that format as it fails to distinguish clause 6.5 paragraph 1 from clause 6.5.1), but I do not find it at 6.2.5 27 in any official version of the C standard. In 1999, it was paragraph 26. In 2011, it was paragraph 28 (although I am looking at a draft for 2011, but I think it was the last one before release).Foghorn
S
7

In C any data pointer can be passed to a function that expects a void * and a void * can be stored to any pointer type. There is an implicit conversion between void * and other pointer types. But this does not mean that this conversion is harmless. On some architectures where void * and int * have a different representation, converting from int * to void * and then back to int * is specified as producing the same pointer value, but the converse does not hold: converting a void * to int * and back to void * may produce a different value, especially if the void * was not obtained by converting an int *.

Interchangeability means that this implicit conversion does not change the representation of the pointer. the conversion can be operated both ways successfully: converting a character pointer to void * and back produces the same pointer and vice versa.

Here is an example:

#include <assert.h>
#include <stdio.h>
#include <string.h>

int main() {
    char *s = "abc";
    char *s1;
    void *p;
    void *p1;

    assert(sizeof(p) == sizeof(s));
    memcpy(&p, &s, sizeof(p));
    p1 = s;
    assert(p == p1);
    memcpy(&s1, &p1, sizeof(s1));
    assert(s == s1);
    return 0;
}

Note however that this does not imply that !memcmp(&p1, &s, sizeof(p1)) because pointers could have padding bits. Neither can you violate the strict aliasing rule by casting through a void *:

  • float f = 1.0; unsigned int i = *(int *)(void *)&f; incorrect.
  • float f = 1.0; unsigned int i; memcpy(&i, &f, sizeof(i)); correct if sizeof(int) == sizeof(float) but may produce a trap value.
Shorthorn answered 10/1, 2021 at 9:39 Comment(0)
F
2
  1. What does it mean by "interchangeability"? Does it say the argument and the return values of a function void* Func(void*) can both be char*?

Yes, that is what it says, but it is non-normative text that conflicts with the normative text of the standard. Let’s discuss question 2 and then come back to this.

  1. If so, is it an implicit conversion made by the compiler?

No, not in the situations intended to be addressed by this note.

If there is a visible declaration of void *Func(void *);, and you execute:

char *p = something;
char *q = Func(p);

then the argument p is converted void * and the returned value is converted to char *. But these conversions occur as part of the normal operations of function calls and assignments; they have nothing to do with the types having the same representation or being interchangeable. For example, if you executed code like the above but with int * instead of char *, the conversions would occur between int * and void * even if they do not have the same representations and are not interchangeable. The argument conversion is made because the compiler knows the parameter type of Func, so it performs the conversion as required by the rules for function calls, and the assignment conversion is made because the compiler knows the type of the destination of the assignment, so it performs the conversion as required by the rules for assignment.

However, suppose we have this code:

char *Func(char *);
char *p = something;
char *q = Func(p);

but Func is in fact defined in its library source code as void *Func(void *);. Then the rule in C 2018 6.2.5 281 applies. In the calling code, the compiler is told the parameter and the return type are char *, so no conversion is performed in either case. When passing the char * argument, the compiler passes exactly the bytes that represent a char *. In the receiving function, the code expects a void *. Since the bytes representing a char * are exactly the same as the bytes representing a void *, with the same meaning in regard to the represented address, this works: The function receives the bytes it expects to receive, with the intended meaning. Similarly, when the function returns the bytes for a void * and the calling code interprets those bytes as a char *, it works because the bytes are the same, with the same meaning.

Getting back to question 1, this example where Func is called using the type char *Func(char *) but is defined using the type void *Func(void *) violates the normative part of the C standard. C 2018 6.5.2.2 6 says:

If the function is defined with a type that includes a prototype, and… the types of the arguments after promotion are not compatible with the types of the parameters, the behavior is undefined.

char * is not compatible with void *, so the behavior is not defined by this rule. However, if the calling code is in one translation unit and the called code is in another translation unit, and no information about the calling code or the called function (notably no type information) is passed between translation units except for linking the name to the function, then it is impossible for the C implementation to distinguish our example code from code in which the function is called using a type compatible with its definition. In particular, the fact that char * has the same representation as void * means that the result of compiling the calling code must be identical whether it uses char *Func(char *) or void *Func(void *) (given the caveat that no type information ais passed between translation units), and it means that the result of compiling the function definition must be identical whether it is defined using char * or void *. In other words, a rule of the C standard says the behavior is not defined, but it is logically impossible in this situation for the compiler to compile the example code differently from the code with defined behavior.

I conjecture that this note in the standard may have been the result of the committee, or at least one or more members of it, wanting to say that, at least in some senses, char * could be used in place of void * and vice-versa, but that the committee did not have the time or motivation or other opportunity to draft formal language for this and make it a normative part of the standard, so it settled for making it a note.

  1. And what is it about the members of unions? I really don't get a grasp of the meaning of this. Can anyone give me a simple example?

Consider this union:

union foo
{
    void  *v;
    char  *c;
    float *f;
} u;

When we write into one union member, as with u.v = &a;, and read from another union member, as with char *p = u.c;, the bytes in the union are reinterpreted in the new type (C 2018 6.5.2.3 3 and note 99). Since void * and char * have the same representation, this reinterpretation must produce the same value. Thus, we are guaranteed that:

char a;
u.v = &a;
printf("%d\n", u.c == &a);

prints “1”. On the other hand, we are not guaranteed that for this code:

float f;
u.v = &f;
printf("%d\n", u.f == &f);

In this code, when &f is converted to void *, a void * might have a different representation from a float *, so the bytes representing &f may be different from the bytes representing (void *) &f. The latter are the bytes stored in u.v. When those bytes are read as u.f and reinterpreted as a float *, they might represent a different value, so the comparison might not evaluate as true.

Footnote

1 The question cites “6.2.5.27,” but the quoted passage is found in clause 6.2.5, paragraph 28, of the official 2018 C standard. The note cited as note 39 is found as note 49.

Foghorn answered 5/1, 2023 at 1:30 Comment(5)
Good write-up and circle back to the question. One thing I'm unclear on, in your discussion of C 2018 6.5.2.2 6, you state (beginning the partial paragraph below the example) "char * is not compatible with void *". What I'm unclear on is whether this is limited to your specific example and context there, or if I'm missing your intent in light of C 2018 6.5(7) (last bullet in particular). The confusion being how would the prototype of void* result in UB due to the later definition with char* if void* and char* are expressly type-compatible? (I may just be misreading the answer?)Bounteous
I guess the point being made is there can be no variance between the prototype and definition without invoking UB even if in separate translation units and the compiler has no way to tell? (I think that's what you meant, and 6.5(7) would not be relevant there)Bounteous
@DavidC.Rankin: char * and void * are not type-compatible. The intent of the rules about type compatibility is that two types are compatible only if they can be completed to the same type. int and int are compatible since they can be completed (by no operation) to the same type, int. int [3] and int [] are compatible since they can both be completed to int [3]. char * and void * are complete, so they cannot be completed to the same type.Foghorn
@DavidC.Rankin: The last item of 6.5 7 says you can access (the bytes of) any object using a character type. Passing a void * argument for void *Func(void *); to a char * parameter defined in another translation unit as char *Func(char *); is not accessing the void * as a character type; it is accessing it as a char * type, which is a pointer type, not a character type. Accessing something as a character type can be done by converting its address to char * and then dereferencing that with *. That accesses with char, not with char *.Foghorn
Thank you. I was reading your answer wrong. The key is the "completed" context. That was the point it was missing.Bounteous
C
-2

A pointer is just an address in the memory. You can think the memory is continuous region of a byte, which is very large (e.g. on a 32 bit process it will be 4 GB but usually the process is not able to use the whole depend on the system).

That mean the value of a pointer is actually an integer represent the zero-based index of a byte in the memory (e.g. pointer with value 0 refer to the first byte in the memory but in really you will not be able to de-reference this address due to it is a null pointer).

When you de-reference a pointer what it does is reading/writing to that address. The size to read/write is depend on the type of pointer. If a pointer is int and its size on that system is 32 bits, which is 4 bytes; it will read/write 4 bytes starting at that address. What alignment means is how the value stored in the memory. Let say if the value stored in memory need to be 16-bytes alignment that means its starting address must be multiply with 16.

What I explain here is just a high-level of the pointer, which should be enough for getting started. In reality it have a lot of things related to it like memory protection, paging, etc.

Cache answered 10/1, 2021 at 9:48 Comment(1)
This is a simplistic memory model that is quite common nowadays. The language in the C Standard quoted by the OP addresses non-trivial cases where different pointer types might not have the same representation or sizes. Such architectures might not be POSIX compliant because all pointers must have the same size for POSIX, but nevertheless have been around for a long time.Shorthorn

© 2022 - 2024 — McMap. All rights reserved.