Union with const and non-const members
Asked Answered
V

4

5

I'm writing some library code that exposes a const pointer to users but during certain operations I need to change where this pointer points (behind the scenes switcheroo tricks). One idea I had to solve this problem without encountering UB or strict-aliasing violations was to use a union with a const member:

// the pointed-to objects (in production code, these are actually malloc'd blocks of mem)
int x = 0, y = 7;

typedef union { int * const cp; int * p; } onion;
onion o = { .cp = &x };
printf("%d\n", *o.cp);   //  <---------------------- prints: 0
o.p = &y;
printf("%d\n", *o.cp);   //  <---------------------- prints: 7

But I don't know if this is well-defined or not... anybody know if it is (or isn't) and why?


EDIT: I think I muddied the waters by mentioning I was building a library as lots of people have asked for clarifying details about that rather than answering the much simpler question I intended.

Below, I've simplified the code by changing the type from int* to just int and now my question is simply: is the following well-defined?

typedef union { int const cp; int p; } onion;
onion o = { .cp = 0 };
printf("%d\n", o.cp);   //  <---------------------- prints: 0
o.p = 7;
printf("%d\n", o.cp);   //  <---------------------- prints: 7
Vanny answered 3/3, 2021 at 14:30 Comment(14)
Apologies to the 1st 2 people who answered & then had to hide their answers: I had a typo in my post that changed the question significantlyVanny
When you use "const pointer" are you specifically using int * const cp to mean cp is const or did you want const int * cp meaning cp points to const data?Madoc
I meant: int * const cp (the typo was: const int * cp)Vanny
Please show in your question how your library exposes the pointer. I guess you want to achieve that the pointer can only be modified inside the library and not by the user of the library. Instead providing a global pointer variable, I suggest to use a getter function that returns the pointer to the user of the library. You should consider that the code that uses the library might still have a copy of the old pointer value when you have changed it internally. This can also happen due to compiler optimization because the const tells the compiler that it the value will not change later.Shellbark
@Shellbark good question: I understand that a pointer to a struct or union can be converted to a pointer to its first member and vice versa (which now makes me realize the order of my union members is reversed... I'll fix that after this comment). I was thinking I could pass them the address of the union cast to the address of the 1st member. Not sure what that would look like... maybe: (int * const *)&myunionVanny
A pointer to a struct or union is normally the same address as a pointer to the first element (of any variant in case of a union), but it is still not clear how you want to use this. Show an example how the interface and the internal part of the library would look like and how the interface should be used. Explain what you want to achieve. Please edit your question to add all clarification or requested information. This looks like an XY problem. meta.stackexchange.com/questions/66377/what-is-the-xy-problemShellbark
Why not simply static int *internal_pointer; in your library and int * get_pointer(void) { return internal_pointer; } as the interface to get a copy of the pointer. This way the user cannot modify the pointer value in the library, but the library code can.Shellbark
Rather than passing them a type-casted address of the union - (int * const *)&myunion, you can pass them the address of the cp member which requires no type-cast - &myunion.cp.Faubert
You should clarify what you are trying to do. Given any const int *cp that points to x, which was not defined const, it is defined behavior to convert cp to int * and use it to access x. This is because, in order to set cp to point to x, there first had to be an int * (without const) pointing to x (such as &x). So that int * has been converted to const int *, and that is the value that ended up in cp. Then C 2018 6.3.2.3 7 says that when a pointer is converted back to its original type, it equals the original pointer. Thus (int *) cp gives us the original &x.Supernatant
This question notwithstanding -- if you're intending a followup that involves casting a pointer to non-union to be pointer to union with one of these as members then that's a whole new can of wormsMellifluent
@M.M, noted... I'll post that followup in another question that references this one as backgroundVanny
This won't be relevant to that, as the rules focus on modification of a const object (not the const-ness of any intermediate expressions)Mellifluent
@Mellifluent oh, I see. Well, in that case I guess I'll let this (revised) question stand on its own merits (as I'm still interested in the answer), but then ask an entirely new one later with the full details of my pointer-based library. (I'll also wait a full 24 hr cycle for previous respondants to update their answer, if they wish, before choosing one.) ThxVanny
You can always choose an answer now and change the choice laterMellifluent
M
2

I think this is undefined as per C11 6.7.3 (equivalent paragraph is in all versions of the standard):

If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined.

o.cp is undoubtedly an object defined with a const-qualified type.

The modification of o.p does seem to me to count as an attempt to modify o.cp , since that is exactly why we are doing it!

Mellifluent answered 4/3, 2021 at 0:50 Comment(13)
Thx @M.M. What about for C99? (I should've been explicit, in the question, about the version of the standard I'm using... and it's becoming obvious I need to improve my question writing skils!)Vanny
@Vanny it's the same in all versions, I just used this as a concrete referenceMellifluent
Is union U { int i; float f; } u; u.i = 0; UB because when assigning to u.i, we also modify u.f, and we do it through an lvalue of int type thus violating the so-called strict aliasing rule[s]?Moldy
@LanguageLawyer Unrelated to this Q/A; post as a new question if you can't find a duplicateMellifluent
It is related to your answer. Since you don't disclose why The modification of o.p does seem to me to count as an attempt to modify o.cp, I'd like to know when modification of one union member counts as a modification of the other member[s] and when it doesn't.Moldy
@LanguageLawyer How about making a point directly instead of using some roundabout trolling? I'd say that modifying one member of a union always counts as an attempt to modify the other members . (NB. Will not be responding to attempts to derail the topic into strict aliasing)Mellifluent
an object defined with a const-qualified type I'm not sure if int * const cp; inside the union is a definition. The filthy standard says A definition of an identifier is a declaration for that identifier that: for an object, causes storage to be reserved for that object; and I don't think int * const cp; per se causes storage reservation. It is like top- or block-level extern int x; declaration which is not a definition.Moldy
@LanguageLawyer If we take that view, then we're also allowing struct { int const x } s = { 0 }; *(int *)&s.x = 5; , I wouldn't say that was the intent but could be wrong . Again that would be worth its own Q/A rather than being a comment discussion though, then we could revisit this one afterwardsMellifluent
@LanguageLawyer int * const cp; inside a definition of a union type is not a definition of any object, but any definition of an object having that union type comprises a definition of a member object having that const-qualified type.Mahone
@JohnBollinger any definition of an object having that union type comprises a definition of a member object having that const-qualified type Any wording supporting this?Moldy
@LanguageLawyer, the same paragraph 6.7/5 that you already quoted. But if you seriously want to debate that then I'm going to second M.M's suggestion to pose a new question.Mahone
@M.M: Interpreting the Standard as defining that construct, at least in cases where a freshly-cast pointer is used for access, makes it possible to guard fields against being written "accidentally" even in cases where there must be some means of updating them. Treating as UB actions which would write top-level objects that are qualified as const makes it possible for implementations to place such objects in non-writable areas of memory. What advantage would there be to allowing implementation to process your snippet in any fashion different from how it would behave absent const?Bazar
I don't think that this rule applies here. The memory region of o.cp must be const but it cannot because it is shared with o.p which is not const qualified. I agree that o.cp is a const-qualified l-value expression but I am not convinced if the object it designates is actual const-qualified.Kaon
B
2

The Standard uses the term "object" to refer to a number of concepts, including:

  1. an exclusive association of a region of storage of static, automatic, or thread duration to a "stand-alone" named identifier, which will hold its value throughout its lifetime unless modified using an lvalue or pointer derived from it.

  2. any region of storage identified by an lvalue.

Within block scope, a declaration struct s1 { int x,y; } v1; will cause the creation of an object called v1 which satisfying the first definition above. Within the lifetime of v1, no other named object which satisfies that definition will be observably associated with the same storage. An lvalue expression like v1.x would identify an object meeting the second definition, but not the first, since it would identify storage that is associated not just with the lvalue expression v1.x, but also with the named stand-alone object v1.

I don't think the authors of the Standard fully considered, or reached any sort of meaningful consensus on, the question of which meaning of "object" is described by the rule:

If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined.

It would certainly make sense that if an object of the first kind is defined with a const qualifier, the behavior of code that tries to modify it would be outside the Standard's jurisdiction. If one interprets the rule as applying more broadly to other kinds of objects as well, then actions that modify such objects within their lifetime would also fall outside the Standard's jurisdiction, but the Standard really doesn't meaningfully describe the lifetime of objects of the second type as being anything other than the lifetime of the underlying storage.

Interpreting the quoted text as applying only to objects of the first kind would yield clear and useful semantics; trying to apply it to other kinds of objects would yield semantics that are murkier. Perhaps such semantics could be useful for some purposes, but I don't see any advantage versus treating the text as applying to objects of the first type.

Bazar answered 9/3, 2021 at 21:9 Comment(12)
Where do you find objects of the first kind in the C standard?Zecchino
@n.m.couldbeanAI: Named objects of static, automatic, or (later) thread duration would satisfy that description. If a program declares int x,y;, it must behave as though disjoint regions of storage are allocated to x and y, and to nothing else, within the lifetimes of x and y. An implementation might "borrow" storage for automatic objects before they are first written, or after the last time they will ever be read, but that would not observably affect program behavior.Bazar
@n.m.couldbeanAI: The aspect of that which is relevant to the root question here is that if a named object is declared const, it cannot be part of a higher-level object that isn't. By contrast, an lvalue of the second kind might be const-qualified, but identify the same storage as a non-const object; except for some scenarios involving restrict, the association of a const-qualified lvalue with the storage associated with a non-const object would not interfere with the ability of the object to be modified by means not involving that lvalue.Bazar
This is all fine and dandy but what wording in the C++ standard supports this? I cannot find anything like that in the text.Zecchino
@n.m.couldbeanAI: The question was about C, rather than C++. The C standard does not recognize any concept of objects lifetime separate from the lifetime of containing storage. If some type struct foo has a const-qualified member, and a program sets a static struct foo* p to point to some region of storage, and never happens to use p again, the Standard would have no way of recognizing when the storage would cease to have any association with struct foo, except in implementations that would opt to make associations of storage with types permanent for the lifetime of the storage.Bazar
Sorry, "C++" was just a typo, I mean C. And I am not talking about the lifetime or any such thing. Which wording in the standard supports existence of two separate kinds of objects? v1 designates an object, v1.x also designates an object, I don't see what allows you to single out one or the other as more privileged.Zecchino
@n.m.couldbeanAI: Nothing in the Standard suggests any intention that adding members to a union but not using them would impair the ability to use other members. Further, if union U u; contains a member m of type T, the address of u.m is specified as being the same as the address of u, which would imply that the construct *((T*)&u; would be equivalent to u.m even though it never uses that member.Bazar
The intentions may be good but actual wording not so much. If there is wording that contradicts the stated intention, one should file a defect report. In your view, is a defect report warranted here?Zecchino
@n.m.couldbeanAI: If the C Standard is not intended to fully and unambiguously specify all corner cases that implementations should process meaningfully, its failure to specify this one is not a defect. If it is intended to fully specify all corner cases, it is grossly deficient in many ways far more serious than this, making this defect essentially irrelevant. As it is, the biggest problem is the lack of any articulated consensus as to how complete the Standard is intended to be.Bazar
If the standard doesn't specify behaviour, it's OK (undefined by omission). If it specifies behaviour that is different from the obvious intention, it's a defect. Is the one under discussion a defect? I'm not asking you to rank its urgency or grossness (the standard is full of such defects and I'm not on a quest to fix them).Zecchino
@n.m.couldbeanAI: The Standard does specify the behavior by specifying that various operations are transitively equivalent. The only problem is that the Standard will sometimes explicitly specify the behavior of action #1, specify that #2 is equivalent to #1, #2 is equivalent to #2, etc. but then specify that action #N as invoking UB even though it is also transitively equivalent to #1, whose behavior was unambiguously defined. If one interprets such situations as allowing implementations to deviate from the defined behavior of action #N in cases where this would make an implementation...Bazar
...more useful to its customers, then this need not be viewed as a implying contradiction. Even if the transitive equivalence held, one could say that the behavioral specification describes how things should behave when an implementation would have no reason to process a program in any other way.Bazar
L
1

Every programming book I've had told me the following.

static const int x = 7;
int *px = (int *)&x;

is not defined, but

static int x = 7;
const int *px1 = &x;
int *px2 = (int *)px1;

is defined. That is, you can always cast away the const-ness if the originating pointer (here the &x) wasn't const.

Here I'm leaning on the lack of a contrary opinion from any quality source and not bothering to look up the standard (for which I'm not going to pay).

However you're trying to export something const that isn't const. That is actually valid. The language allows for

extern const * int p;

to be writable behind the secnes. The way to switch it out to the file with the definition doesn't see it const is to define it as int *p; and carefully not include the declaration in the file containing the defintion. This allows you to cast away the const with impunity. Writing to it would look like:

int x;

    *((int **)&p) = &x;

Old compilers used to reject extern const volatile machine_register; but modern compilers are fine.

Lysias answered 3/3, 2021 at 15:4 Comment(6)
Casting const to non-const is fine. Problems come from accessing the stored value of const qualified object through unqualified type.Alleyne
@user694733: Made some adjustments.Lysias
This answer doesn't seem to address the question at all (which is about the behaviour of a union)Mellifluent
@M.M: Question is how to export a const variable while internally being able to write to it, and suggests a union as one possible answer. I don't suggest a union as an answer because I think he's trying to use the union to force a typecast through that doesn't need it and couldn't come up with a pathway where the union wasn't completely supliferous.Lysias
@Joshua: On common platforms, an implementation that specifies that it processes exported and imported definitions according to the platform's documented Application Binary Interface without caring about how exported symbols would be used or how imported symbols were actually defined, will consequently extend the semantics of the language to allow the symbol to be defined and exported without a const qualifier, but other compilation units import via declarations that include a const qualifier. Because some platforms use different kinds of linker symbols for things in read-only...Bazar
...storage than for things in RAM, mandating tolerance for such a difference in qualifiers would make the standard impractical on some platforms. On the flip side, when the Standard was written most implementations already supported the semantics at issue without any mandate from the Standard, so there was no perceived need to mandate that the platforms that could process a construct usefully do what they were already doing.Bazar
D
1

If the interface is a const-declared pointer such as int *const (like you've indicated in your comment), then there's nothing you can do to change that that will not trigger UB.

If you're storing an int * somewhere (e.g., as a static int *ip;) and are exposing its address via a an int *const* pointer (e.g., int *const* ipcp = &ip;, then you can simply recast to back to (int**) (the original type of &ip from the example I gave) and use that to access the int* pointer.

Deerskin answered 3/3, 2021 at 15:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.