How to know that which variable from Union is Used?

Asked 16/11, 2010 at 13:17 Answered 3/7, 2018 at 14:53

If I declare a Union as:

union TestUnion
{
    struct 
    {
      unsigned int Num;
      unsigned char Name[5];
    }TestStruct;
    unsigned char Total[7];
};

Now, How can I know that whether Total[7] is used or TestStruct is used?

I am using C! I was revisiting unions and structures and this question came to my mind. "sizeof" can't be used as both are of same size i.e. 7 bytes. (And Here comes another question)

When I filled only "Total" with a Character 'a' and Tried sizeof(TestUnionInstance), it returned 12 (Size of Char is 1 byte, Right?). So I isolated the structure from it and found that Size of Structure is 12 bytes not 5+2=7 bytes.... Strange!! Anybody can explain??

P.S. I am using Visual Studio 2008.

Drice answered 16/11, 2010 at 13:17 Comment(0)

You can't. That's part of the point of unions.

If you need to be able to tell, you can use something called a tagged union. Some languages have built-in support for these, but in C, you have to do it yourself. The idea is to include a tag along with the union which you can use to tell which version it is. Like:

enum TestUnionTag {NUM_NAME, TOTAL};

struct {
    enum TestUnionTag tag;
    union {
        struct {
            unsigned int Num;
            unsigned char Name[5];
        } TestStruct;
        unsigned char Total[7];
    } value;
} TestUnion;

Then in your code, you make sure you always set the tag to say how the union is being used.

About the sizeof: the struct is 12 bytes because there are 4 bytes for the int (most modern compilers have a 4-byte int, the same as a long int), then three bytes of padding and five bytes for the chars (i don't know if the padding comes before or after the chars). The padding is there so that the struct is a whole number of words long, so that everything in memory stays aligned on word boundaries. Because the struct is 12 bytes long, the union has to be 12 bytes long to hold it; the union doesn't change size according to what's in it.

Palimpsest answered 16/11, 2010 at 13:27 Comment(1)

The padding comes after the char, not before. Strictly speaking an implementation could put padding before it, but then it would also have to put the same amount of padding before it if you changed from char [5] to char[6], which would be pretty nonsensical. C requires structures with a common initial sequence of elements to be compatible. – Crucial 16/11, 2010 at 13:55

The member to use is the one you last wrote to; the other(s) are off limits. You know which member you last wrote to, don't you? After all, it was you who wrote the program :-)

As for you secondary question: the compiler is allowed to insert 'padding bytes' in the structure to avoid unaligned accesses and make it more performant.

example of a possible distribution of bytes inside your structure

Num    |Name     |pad
- - - -|- - - - -|x x x
0 1 2 3|4 5 6 7 8|9 a b

Faddish answered 16/11, 2010 at 13:28 Comment(6)

+1 for taking the time to show the memory layout of the object. – Nabors 16/11, 2010 at 13:31

Actually, sometimes you just need to access the other members. One important function of unions is providing different views to the same data. – Grape 16/11, 2010 at 13:34

As for the "off-limits" comment, it should be noted that as far as the compiler is concerned nothing is off-limits. The programmer has to manually enforce any such policies. – Grape 16/11, 2010 at 13:36

@Grape - The standard defines this behavior as undefined, therefore it is off-limits unless you're explicitly relying on platform-dependent behavior. In practice I know what int is_little_endian(void) { union { int i; char c[sizeof(int)]; } u; u.i = 1; return u.c[0] == 1; } will do, but by the standard that code is UB and is evil. – Nabors 16/11, 2010 at 13:48

It's undefined WRT endianess and structure memory layout. What IS defined is that all "alternatives" for a union will start at the same address, which allows people who know what they are doing to perform little miracles :-) – Grape 16/11, 2010 at 14:3

Actually, accessing a different union member than the member last written to, is implementation defined (6.5.2.2/5). – Faddish 16/11, 2010 at 14:13

Short answer: there is no way except by adding an enum somewhere in your struct outside the union.

enum TestUnionPart
{
  TUP_STRUCT,
  TUP_TOTAL
};

struct TestUnionStruct
{
  enum TestUnionPart Part;
  union
  {
    struct
    {
      unsigned int Num;
      unsigned char Name[5];
    } TestStruct;
    unsigned char Total[7];
  } TestUnion;
};

Now you'll need to control creation of your union to make sure the enum is correctly set, for example with functions similar to:

void init_with_struct(struct TestUnionStruct* tus, struct TestStruct const * ts)
{
  tus->Part = TUP_STRUCT;
  memcpy(&tus->TestUnion.TestStruct, ts, sizeof(*ts));
}

Dispatch on the correct values is now a single switch:

void print(struct TestUnionStruct const * tus)
{
  switch (tus->Part)
  {
    case TUP_STRUCT:
      printf("Num = %u, Name = %s\n",
             tus->TestUnion.TestStruct.Num,
             tus->TestUnion.TestStruct.Name);
      break;
    case TUP_TOTAL:
      printf("Total = %s\n", tus->TestUnion.Total);
      break;
    default:
      /* Compiler can't make sure you'll never reach this case */
      assert(0);
  }
}

As a side note, I'd like to mention that these constructs are best handled in languages of the ML family.

type test_struct = { num: int; name: string }
type test_union = Struct of test_struct | Total of string

Fleece answered 16/11, 2010 at 13:33 Comment(0)

First, sizeof(int) on most architectures nowadays is going to be 4. If you want 2 you should look at short, or int16_t in the stdint.h header in C99 if you want to be specific.

Second, C uses padding bytes to make sure each struct is aligned to a word-boundary (4). So your struct looks like this:

+---+---+---+---+---+---+---+---+---+---+---+---+
|      Num      |   N   a   m   e   |   |   |   |
+---+---+---+---+---+---+---+---+---+---+---+---+

There's 3 bytes at the end. Otherwise, the next struct in an array would have it's Num field in an awkwardly-aligned place, which would make it less efficient to access.

Third, the sizeof a union is going to be the sizeof it's largest member. Even if all that space isn't used, sizeof is going to return the largest result.

You need, as other answers have mentioned, some other way (like an enum) to determine which field of your union is used.

Nabors answered 16/11, 2010 at 13:30 Comment(0)

There is no way to tell. You should have some additional flags (or other means external to your union) saying which of the union parts is really used.

Capitol answered 16/11, 2010 at 13:26 Comment(0)

Another example of including the union with an enum to determine what is stored. I found it much more clear and to the point.

from: https://www.cs.uic.edu/~jbell/CourseNotes/C_Programming/Structures.html

author: Dr. John T. Bell

In order to know which union field is actually stored, unions are often nested inside of structs, with an enumerated type indicating what is actually stored there. For example:

typedef struct Flight {
    enum { PASSENGER, CARGO } type;
    union {
        int npassengers;
        double tonnages;  // Units are not necessarily tons.
    } cargo;
} Flight;

Flight flights[ 1000 ];

flights[ 42 ].type = PASSENGER;
flights[ 42 ].cargo.npassengers = 150;

flights[ 20 ].type = CARGO;
flights[ 20 ].cargo.tonnages = 356.78;

Michelinemichell answered 3/7, 2018 at 14:53 Comment(1)

Yes! Answer by Tom Anderson, which is accepted answer, says exactly same. – Drice 4/7, 2018 at 3:43

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags