Do C and C++ differ on the legality of aligning pointers at runtime?
Asked Answered
E

2

5

Assuming alignment is a uintptr_t power of 2, looking for the next properly aligned address can be done using this expression:

(address + alignment - 1u) & ~(alignment - 1u)

This is used in custom memory allocators.

However, in order to perform this operation, the address needs to be cast to uintptr_t, and the result needs to be cast back to a pointer.

Do C and C++ differ with respect to the legality and correctness of using such an expression?

Earthwork answered 23/10 at 23:36 Comment(22)
What language is that? C and C++ are different languages, and in particular are very different on this kind of topic.Chinkiang
@Chinkiang The ideal answer would point out the difference between the two languages in this regard (if any). I'm aware that C and C++ differ w.r.t. union rules, but not this one.Earthwork
Well, in c++ such pointer manipulation is simply illegal, so the entire premise of your question is invalid. On the other hand c++ provides the alignas keyword for aligning data.Chinkiang
Choose one language tag and delete the other. If you want to ask about both languages, you can enter a separate question for each. Particularly with a language-lawyer tag, you should not conflate C and C++.Evenson
Additionally, you should provide further information about what you are trying to do. Upon conversion to uintptr_t, most C implementations provide a meaningful value that can be used to calculate an aligned address (in an implementation-specific way, not necessarily portable). However, using that address to create or access an object in that location is another matter involving considerably more details about language rules. Unless all you want is an address that is not usable for much, you need to specify what you want to do with it.Evenson
@EricPostpischil This isn't the consensus opinion on META: "Now, there are reasons why a given question might warrant both [c] and [c++] tags: ... the question is about differences between the two languages" meta.https://mcmap.net/q/470435/-access-table-data-to-excelEarthwork
Well thing is, on the specific topic of lifetimes, asking the difference between C and C++ is a bit like asking the difference between a car and a fish. They're just not the same, but there is no one specific difference to point at.Chinkiang
@EricPostpischil Naturally, the resulting address would be used for things. This sort of pointer manipulation is often used in custom allocators.Earthwork
But you're just asking a question for both C and C++, and then claiming it's about the difference. There's nothing at all about your question that can't be asked and answered in two separate questions instead.Fallacious
Nothing in your question asks about the differences between C and C++. It asks whether pointers can be aligned without violating strict aliasing. Nothing in the question says anything like “What are the differences between C and C++ about how an aligned pointer must be constructed?” Your question is on the way to be closed, voted down, and possibly deleted. You can argue about that and suffer the consequences, or you can solve it by doing as you have been advised: Select one tag and add the additional information requested. One way may get you answers. The other will not.Evenson
@EricPostpischil When someone asks "is X legal in A and B", he automatically asks about whether the answer is different for A and for B (Unless you go out of your way to misunderstand this as asking about the logical conjunction). So it was already a question about the difference. But I rephrased it, so I hope it's even clearer now.Earthwork
It depends on the point of view if C and C++ are different languages. C++ started out as an extension to C with the idea that "every valid C program is also a valid C++ program". Even though this is not true anymore, there is a significant opportunity to write code that is valid in both languages and thus it can be relevant to ask about the behavior in both languages together. C++ and C are not different like C and Python. When closing a question, please consider that everyone is prevented from contributing, also those for who the question makes sense.Ballman
@EricPostpischil The tag usage policy is that if the question is explicitly about the difference between C and C++, then using both tags is fine. We can't dismiss that as "one question about C and one question about C++" because the core issue might be code compatibility between the languages. The strict aliasing rule for example is subtly different between the languages, which may or may not matter in the specific case.Jumper
@Lundin: The question at the time the comments were written did bot ask about differences between the two languages.Evenson
@Lundin: Sadly, in pursuit of stubborn insistence on using both tags and not accepting advice, OP has edited their question into something useless. Now it just asks about arithmetic, for which any differences are trivial. OP’s original question involved object model and aliasing issues, for which the details are substantial and different between the languages. Splitting the question and asking about the aliasing issues in implementing memory allocation routines could have gotten them useful answers. The current form of the question likely will not.Evenson
@MWB, "... the legality of using such an expression" is one thing (does it compile?), yet the correctness is another (does it align?). The expression makes some sense with common linear mapping of addresses to uintptr_t. Yet math on uintptr_t is not spec specified to form the equivalent with math on a pointer (even a char *). The unposted larger code functionality is perhaps reasonable, yet implementation specific.Cosmetic
You keep tweaking irrelevant things in the question. The expression to calculate alignment is not an issue in C or C++. It is a legal expression, and given an address in address and an alignment requirement in alignment, it will calculate the desired address, within the representable bounds of the type. None of that is a problem in writing a custom memory allocator…Evenson
… There are three problems that you should be asking about: Is the conversion from a pointer to uintptr_t sufficiently specified, is the conversion from uintptr_t to a pointer sufficiently specified, and are the aliasing rules satisfied, in regarding to the client using the allocated memory and later releasing it the memory allocated, which may reuse it for other purposes? Those are the questions that are critical for implementing memory allocation routines. The alignment calculation is trivial.Evenson
The answers for C are no, no, and no. It is impossible to implement memory allocation routines, notably those of malloc and free, using only strictly conforming C. It can only be done using implementation-specific characteristics, such as guarantees about the pointer conversions. A primary way the aliasing issues are resolved is by compiling the memory allocation routines separately from their clients. The C standard does not specify it, but aliasing issues cannot cross translation units that are kept semantically separate through compilation and linking.Evenson
But the question you have insisted upon writing does not solicit these answers or other information that is useful and critical for implementing memory allocation routines.Evenson
@EricPostpischil But I am asking those 3 questions. Note the word "correctly" in the question. In any case, if you think this can be worded better, feel free to edit.Earthwork
@EricPostpischil "aliasing issues cannot cross translation units" -- Even with LTO?Earthwork
C
3

Do C and C++ differ with respect to the legality and correctness of using such an expression?

In both C and C++ the semantics of your code are implementation-defined.

The only use of a pointer->integer cast that is guaranteed in C++ to work is to cast the integer result back to the original pointer type, in which case it produces the original pointer value. (Note that "value" here includes pointer provenance. If there are multiple objects at the address or if there is an object ending at the address immediately before the casted one, then there is no guarantee that the cast can be used to obtain a pointer to a different object than it originally pointed to or one-past.)

In C, there are no guarantees made about the result of the cast at all, except for casting null pointer constants (i.e. 0) to pointers (but only in that direction, not the the other!). In C the existence of uintptr_t guarantees that a cast from a pointer to it at least doesn't have undefined behavior or results in a trap representation. Otherwise not even that would be excluded.

C++ as std::align for this purpose which is not relying on the implementation-defined semantics. There is no portable equivalent for C as far as I am aware.

Also note that even though in practice the calculation you show will work as expected, how to actually use the result in order to access memory by some type without UB or to compare the pointer result (after a cast) to other pointer values is a much more complex problem. Here C and C++ have differing object models and additionally there are pointer provenance issues to consider in both languages.

In particular, for C++, even with the portable result of std::align it is not specified to point to any particular object. It is only returning a pointer value representing the desired address. In order to access through that pointer, generally a cast to the target pointer type and std::launder is required if an object of the target type is already alive at the memory address or a placement-new if there isn't yet. (And even then there are reachability requirements for std::launder to have defined behavior. For example this can't be used to access a complete object from a one-past-the-end pointer to an object located before that object in memory.) All of this would technically be the case even if std::align is only used for arithmetic in a character buffer, although there might be an underspecification of the function here as well.

Converse answered 25/10 at 11:44 Comment(11)
In C, the only aspect of integer-to-pointer conversion that is guaranteed to work is that a round-tripped pointer will compare equal to the original. Given that most objects will have some other object immediately following them, and a pointer "one past" an object is specified to compare equal to a pointer to an immediately following object, but not be usable to access the latter object, a Deathstation 9000 could have a round-trip conversion yield whatever kind of pointer would not match the way a program would try to use the result. Actually, in all but the latest clang...Llewellyn
...converting to uintptr_t a "one past" pointer and a pointer of unknown provenance, and then performing a sequence of calculations that would show that the uintptr_t values must have been equal, would result in clang assuming it was impossible for the latter pointer to point to the object following the "one past" pointer, even without any integer-to-pointer conversions.Llewellyn
@Llewellyn I am more familiar with C++ (unfortunate that the question asks about both) and in C++, at leas the pointer provenance is encoded in the "value" of a pointer by means of pointer values pointing to specific objects rather than just an address.Converse
@Llewellyn Now that I look at the most recent C draft, I do not even see any guarantee that the round-trip cast will result in values that compare equal in C. Only the null pointer constant to pointer cast seems to be explicitly defined.Converse
It's a shame the C and C++ standards embody a broken abstraction model, rather than acknowledging that every region of storage that contains nothing that isn't a PODS simultaneously contains all PODS that will "fit", but that for any conflicting pair of accesses (pair of accesses to the same storage which contain at least one write), either certain kinds of evidence must exist suggesting they might be the same storage, or something that occurs between them must force one to occur before the other. This abstraction model would be better for programmers and compilers alike, with...Llewellyn
...the caveat that compilers whose early stages presently discard evidence that should suggest that things might alias would be need to retain it in order to exploit cases where no such evidence exists. The present messy situation exists because in the days before gcc implemented type-based aliasing analysis, its early stages discarded information that would have allowed TBAA to be implemented sanely, and rather than fix that the authors tried to kludge the downstream stages to make inferences based upon what the upstream stages gave them.Llewellyn
It would have been obvious when the Standard was written that float test(float *p1, float *p2) { *p1 = 1.0; *(unsigned*)p2= 1.0f; return *p1;} should allow for aliasing even if float test(float *p1, unsigned *p2) { *p1 = 1.0; *p2= 1.0f; return *p1;} does not, but if an upstream compilation stage treats both as equivalent to float test(void *p1, void *p2) { *(float*)p1 = 1.0; *(unsigned*)p2= 1.0f; return *(float*)p1;}, a downstream stage given that last form would have no way of making the distinction.Llewellyn
Such a model would make life easier for programmers, because the question of whether accesses to an object could be consolidated could be resolved by looking for specific things in specific places, and situations where incorrect consolidation might be a danger could be handled by having programmers ensure that a compiler that looks where required would find evidence of the specified forms.Llewellyn
…and C90 did not even have a uintptr_t, this was introduced in C99.Abatement
intptr_t and uintptr_t are a bit of a misfeature if pointers aren't round trippable...Camelback
@Camelback It is difficult to specify the behavior of these casts properly because a single numeric value is not enough to encode all the information that a compiler needs about pointer values in order to do aliasing analysis as compilers currently do. You can find a lot of discussion and papers on the topic of "pointer provenance". intptr_t and uintptr_t are only meant to guarantee an integral type that is large enough to hold addresses.Converse
S
4

Strict aliasing is irrelevant here. But the value produced by casting pointer to integer is implementation defined. Use std::align to do this portably.

Slightly answered 23/10 at 23:39 Comment(5)
Is there an equivalent solution for C? OP seems concerned with both.Lakenyalaker
You should probably warn them that casting that pointer to anything but char* or std::byte* and then dereferencing it is undefined behavior. You have to go through placement new to create an object in the storage.Chinkiang
@Lakenyalaker There should only be one language per question (unless it is clearly about inter-operability between the languages)Inexactitude
What if you want to bump the pointer downwards ? See here for why it may be better: fitzgen.com/2019/11/01/always-bump-downwards.htmlEarthwork
@Earthwork to align down just subtract alignment - 1 from the pointer before aligning up with std::align.Slightly
C
3

Do C and C++ differ with respect to the legality and correctness of using such an expression?

In both C and C++ the semantics of your code are implementation-defined.

The only use of a pointer->integer cast that is guaranteed in C++ to work is to cast the integer result back to the original pointer type, in which case it produces the original pointer value. (Note that "value" here includes pointer provenance. If there are multiple objects at the address or if there is an object ending at the address immediately before the casted one, then there is no guarantee that the cast can be used to obtain a pointer to a different object than it originally pointed to or one-past.)

In C, there are no guarantees made about the result of the cast at all, except for casting null pointer constants (i.e. 0) to pointers (but only in that direction, not the the other!). In C the existence of uintptr_t guarantees that a cast from a pointer to it at least doesn't have undefined behavior or results in a trap representation. Otherwise not even that would be excluded.

C++ as std::align for this purpose which is not relying on the implementation-defined semantics. There is no portable equivalent for C as far as I am aware.

Also note that even though in practice the calculation you show will work as expected, how to actually use the result in order to access memory by some type without UB or to compare the pointer result (after a cast) to other pointer values is a much more complex problem. Here C and C++ have differing object models and additionally there are pointer provenance issues to consider in both languages.

In particular, for C++, even with the portable result of std::align it is not specified to point to any particular object. It is only returning a pointer value representing the desired address. In order to access through that pointer, generally a cast to the target pointer type and std::launder is required if an object of the target type is already alive at the memory address or a placement-new if there isn't yet. (And even then there are reachability requirements for std::launder to have defined behavior. For example this can't be used to access a complete object from a one-past-the-end pointer to an object located before that object in memory.) All of this would technically be the case even if std::align is only used for arithmetic in a character buffer, although there might be an underspecification of the function here as well.

Converse answered 25/10 at 11:44 Comment(11)
In C, the only aspect of integer-to-pointer conversion that is guaranteed to work is that a round-tripped pointer will compare equal to the original. Given that most objects will have some other object immediately following them, and a pointer "one past" an object is specified to compare equal to a pointer to an immediately following object, but not be usable to access the latter object, a Deathstation 9000 could have a round-trip conversion yield whatever kind of pointer would not match the way a program would try to use the result. Actually, in all but the latest clang...Llewellyn
...converting to uintptr_t a "one past" pointer and a pointer of unknown provenance, and then performing a sequence of calculations that would show that the uintptr_t values must have been equal, would result in clang assuming it was impossible for the latter pointer to point to the object following the "one past" pointer, even without any integer-to-pointer conversions.Llewellyn
@Llewellyn I am more familiar with C++ (unfortunate that the question asks about both) and in C++, at leas the pointer provenance is encoded in the "value" of a pointer by means of pointer values pointing to specific objects rather than just an address.Converse
@Llewellyn Now that I look at the most recent C draft, I do not even see any guarantee that the round-trip cast will result in values that compare equal in C. Only the null pointer constant to pointer cast seems to be explicitly defined.Converse
It's a shame the C and C++ standards embody a broken abstraction model, rather than acknowledging that every region of storage that contains nothing that isn't a PODS simultaneously contains all PODS that will "fit", but that for any conflicting pair of accesses (pair of accesses to the same storage which contain at least one write), either certain kinds of evidence must exist suggesting they might be the same storage, or something that occurs between them must force one to occur before the other. This abstraction model would be better for programmers and compilers alike, with...Llewellyn
...the caveat that compilers whose early stages presently discard evidence that should suggest that things might alias would be need to retain it in order to exploit cases where no such evidence exists. The present messy situation exists because in the days before gcc implemented type-based aliasing analysis, its early stages discarded information that would have allowed TBAA to be implemented sanely, and rather than fix that the authors tried to kludge the downstream stages to make inferences based upon what the upstream stages gave them.Llewellyn
It would have been obvious when the Standard was written that float test(float *p1, float *p2) { *p1 = 1.0; *(unsigned*)p2= 1.0f; return *p1;} should allow for aliasing even if float test(float *p1, unsigned *p2) { *p1 = 1.0; *p2= 1.0f; return *p1;} does not, but if an upstream compilation stage treats both as equivalent to float test(void *p1, void *p2) { *(float*)p1 = 1.0; *(unsigned*)p2= 1.0f; return *(float*)p1;}, a downstream stage given that last form would have no way of making the distinction.Llewellyn
Such a model would make life easier for programmers, because the question of whether accesses to an object could be consolidated could be resolved by looking for specific things in specific places, and situations where incorrect consolidation might be a danger could be handled by having programmers ensure that a compiler that looks where required would find evidence of the specified forms.Llewellyn
…and C90 did not even have a uintptr_t, this was introduced in C99.Abatement
intptr_t and uintptr_t are a bit of a misfeature if pointers aren't round trippable...Camelback
@Camelback It is difficult to specify the behavior of these casts properly because a single numeric value is not enough to encode all the information that a compiler needs about pointer values in order to do aliasing analysis as compilers currently do. You can find a lot of discussion and papers on the topic of "pointer provenance". intptr_t and uintptr_t are only meant to guarantee an integral type that is large enough to hold addresses.Converse

© 2022 - 2024 — McMap. All rights reserved.