ANSI C unions - are they really useful?
Asked Answered
R

10

5

From a response to some question yesterday, I learned that it is nonportable and unsafe to write into one union member and read the value from another member of a different type, assuming underlying alignment of the members. So after some research I found a written source that repeats this claim and specifies a popular example - using union of int and float to find the binary representation of a float.

So, understanding that this assumption is not safe, I wonder - except for saving memory (duh...) what real use is there to unions?

Note: that is, under Standard C. Clearly, for a specific implementation, the rules are known in advance and can be taken advantage of.

EDIT: the word "unsafe", due to association of recent years, is probably a bad choice of wording, but I think the intention in clear.

EDIT 2: Since this point repeats in the answers - saving memory is a valid argument. I wanted to know if there was something beyond that.

Recountal answered 12/8, 2010 at 3:0 Comment(11)
please link to the cited question and to the written sourceTammany
I actually don't remember the question itself (this was not the subject there, but raised in on one of the comments). The source is "C - A reference Manual", 5th Ed., Harbison/SteeleRecountal
Is this it? #3444251 If so, see my comments under that same answer.Hysterectomize
Do remember that the standard came along after a lot of de-facto usage of C. So the original intent of the union may well have been type-punning, and that may have been its widespread usage for a long time, but they didn't want to enforce it in the standard.Chuu
there is a pretty big difference between undefined behavior and implementation defined behavior. Both are not portable, but the later is very useful when you do know your implementation's details.Leading
@Hysterectomize - yes, that was the question, thanks. Your comment there (the way I understand it) just mean that the only reason to use unions is to save memory.Recountal
@Evan Teran - from the perspective of standard C, implementation defined is similar to undefined. Anyway, I did not say it is undefined behavior in my question.Recountal
@Chuu - I don't agree; If you write a strictly compliant ANSI C (say, for portability purposes), then what use is there for unions. That was my intention in the question.Recountal
@ysap: indeed you didn't imply it was UB, I was just trying to point out that sticking to the standard != portable. You may use unions usefully without violating the standard to do things such as view the bit patterns of floats. It just isn't portable. Keep in mind that portable while a nice goal isn't always necessary. Suppose you only need to target linux and windows. You can easily find out if unions behave as expected between the both and write portable code in the sense that it works on all expected architectures.Leading
Also I do disagree with the "unsafe" claims made in your post. unportable sure, but can easily be used safely and reliably.Leading
@Evan Teran - OK, unsafe in the contemporary sense is probably a bad choice of wording.Recountal
B
10

Yes.

The provide a way of creating generic containers. Though, to get polymorphic behavior you must implement a vtable or type switching yourself...

There are, however, one of those features that you only use when you need them and need rather rarely.

Belindabelisarius answered 12/8, 2010 at 3:4 Comment(4)
OK, I think I understand what you mean. However, since you are required to implement the polymorphism mechanism yourself, what advantage does it have over declaring variables of the different types and accessing them through a pointer which is switched by the mechanism (other than to save memory)?Recountal
Also, was this the intention of the writers of the standard, or is it a smart use case that was developed later?Recountal
Save memory (and it is worth thinking about the characteristics of 'big' machines around the time c was written: main memory measured in kilowords on shared machines), simplified management of complex structures, one character less typing on each access, having the ability to use the memory anyway you want to, and don't neglect the ability to do those "undefined" things 'cause not every program needs to be portable.Belindabelisarius
I'm sure K&R just had in mind what smart assembly coders did, which was squeeze every byte out when it counted.Belindabelisarius
F
3

Yes, unions can be nonportable and unsafe but has its uses. For example, it can speed things up by eliminating the need to cast an uint32 to char[4]. This could come in handy if you are trying to route by IP address in SW, but then your processor endian has to be network order. Think of unions as an alternative to casting, with fewer machine instructions. Casting has similar drawbacks.

Foah answered 12/8, 2010 at 3:17 Comment(1)
As I mentioned in the question, clearly unions can be very useful for a specific implementation. You are assuming specific assumptions on the underlying storage. My question is if they are useful under the standard C perspective. dmckee gives a reasonable argument to that.Recountal
A
3

Even if unions don't offer much in immediate usefulness (reduced memory usage aside), one advantage of using a union over dumping all of its members into a struct is that it makes the intended semantics clear: only one value (or set of values if it's a union of structs) is valid at any given time. It documents itself better.

The mutual exclusivity of members would be less obvious if you instead made all the union members separate members of a struct. Additionally, you'd still have the same problem of ill-defined behavior if you read a member that wasn't previously written to, but now you need to account for the application's semantics too (did it initialize all unused members to 0? did it leave them as garbage?), so in that sense, why wouldn't you use a union?

Asymmetry answered 12/8, 2010 at 3:28 Comment(4)
I think that's only partially correct - my understanding is that members of the same type are guarantied to overlay eachother, so no mutex here.Recountal
@ysap: That's true, but it's an implementation detail of unions that is unrelated to my point.Asymmetry
@jamesdin - well your point is that "due to the unknown arrangement of the members, do assume exclusiveness on access" (or isn't it?). My comment was that this is only partially true.Recountal
@ysap: My point is that a struct by itself conveys no semantic information about how its members are supposed to be used relative to each other. I didn't say anything about "unknown arrangements".Asymmetry
C
3

The question contains a constraint that might disallow a valid answer...

You ask about real usage under the standard, but "real usage" may be allowing a knowledgeable programmer to exploit implementation defined behaviour in ways that the standards committee didn't want to anticipate or enumerate. And I don't mean that the standards committee had a particular behaviour in mind, but that they explicitly wanted to leave the ability there to be exploited in a useful way.

In other words: Unions don't have to be useful for standard defined behaviour to be useful in general, they could simply there to allow someone to exploit the quirks of their target machine without resorting to assembly.

There could be a million useful ways to use them on the various machines available in implementation-defined ways, and zero useful ways to use them in a strictly portable way, but those million implementation-defined usages are reason enough to standardise their existence.

I hope that makes sense.

Chuu answered 12/8, 2010 at 3:38 Comment(2)
Yes, it does make sense, thanks. I am not completely sure at this time that I agree with this as answering my question.Recountal
@Recountal - it doesn't really, but I think it was more an answer than a comment :)Chuu
C
3

Even discounting a specific implementation where the alignment and packing are known, unions can still be useful.

They allow you to store one of many values into a single block of memory, along the lines of:

typedef struct {
    int type;
    union {
        type1 one;
        type2 two;
    }
} unioned_type;

And yes, it is non-portable to expect to be able to store your data into one and read it from two. But if you simply use the type to specify what the underlying variable is, you can easily get at it without having to cast.

In other words:

unioned_type ut;
ut.type = 1;
ut.one = myOne;
// Don't use ut.two here unless you know the underlying details.

is fine assuming you use type to decide that a type1 variable is stored there.

Cytotaxonomy answered 12/8, 2010 at 3:39 Comment(6)
If I understand your example, it is actually similar to the example given in the reference book I mentioned above. Then, the argument boils down to save memory.Recountal
+1 — This is probably the most useful way to use a union within the bounds of the standard.Chuu
OK, after you update, I am sure it is actually similar to that example.Recountal
Yes, and saving memory was a very important consideration in the early days. C was sometimes described as a language with all the power of assembler with all the readability of ... assembler :-) The other thing you must understand was that the initial mandate of ANSI C was to mostly codify existing practices, not standardise a new language. In each subsequent standard, ANSI and ISO, they've been very careful not to break existing code unnecessarily.Cytotaxonomy
It's not just an "early days" consideration — it's important on embedded devices, which is still a massive use-case for C. And even embedded devs might want to upgrade or change their compiler at some point.Chuu
@ysap, I don't doubt it's similar to quite a lot of examples. There's a fairly limited number of ways to do it.Cytotaxonomy
M
3

Here is one legitimate portable use of unions:

struct arg {
    enum type t;
    union {
        intmax_t i;
        uintmax_t u;
        long double f;
        void *p;
        void (*fp)(void);
    } v;
};

Coupled with type information in t, struct arg can portably contain any numeric or pointer value. The whole struct is likely to be 16-32 bytes in size, compared to 40-80 bytes if a union had not been used. The difference would be even more extreme if I wanted to keep each possible original numeric type separately (signed char, short, int, long, long long, unsigned char, unsigned short, ...) rather than converting them up to the largest signed/unsigned/floating point type before storing them.

Also, while it is not "portable" to assume anything about the representation of types other than unsigned char, it is permitted by the standard to use a union with unsigned char or cast a pointer to unsigned char * and access arbitrary data object that way. If you write that information to disk, it won't be portable to other systems that used different representations, but it still might be useful at runtime - for example, implementing a hash table to store double values. (Anyone want to correct me if padding bit issues make this technique invalid?) If nothing else, it can be used to implement memcpy (not very useful since the standard library provides you a much better implementation) or (more interestingly) a memswap function which could swap two arbitrary-size objects with bounded temporary space. This has gotten a little outside usage domain of unions now and into unsigned char * cast territory, but it's closely related.

Maganmagana answered 12/8, 2010 at 4:16 Comment(2)
R, the first half of your answer is clear - save memory. What I'm not sure I understand is the 2nd half. How an unsigned char member can be used to access other members in a predictable way?Recountal
The values aren't "predictable" without knowing the implementation, but they're implementation-defined. As long as your code doesn't make assumptions about what those values are, but merely uses them internally, you're fine. Another possible application would be making a byte-by-byte comparison function for use with qsort when you don't care that the sorting has any relation to the natural numeric ordering of the original type, just that it's well-defined and that the results are reproducible.Maganmagana
M
2

Using a union for type punning is non-portable (though not particularly less portable than any other method of type punning).

OTOH, a parser, for one example, typically has a union to represent values in expressions. [Edit: I'm replacing the parser example with one I hope is a bit more understandable]:

Let's consider a Windows resource file. You can use it to define resources like menus, dialogs, icons, etc. Something like this:

#define mn1 2

mn1 MENU
{
    MENUITEM "File", -1, MENUBREAK
}

ico1 "junk.ico"

dlg1 DIALOG 100, 0, 0, 100, 100 
BEGIN
    FONT 14, "Times New Roman"
    CAPTION "Test Dialog Box"
    ICON ico1, 700, 20, 20, 20, 20
    TEXT "This is a string", 100, 0, 0, 100, 10
    LTEXT "This is another string", 200, 0, 10, 100, 10
    RTEXT "Yet a third string", 300, 0, 20, 100, 10
    LISTBOX 400, 20, 20, 100, 100
    CHECKBOX "A combobox", 500, 100, 100, 200, 10
    COMBOBOX 600, 100, 210, 200, 100
    DEFPUSHBUTTON "OK", 75, 200, 200, 50, 15
END

Parsing a the MENU gives a menu-definition; parsing the DIALOG gives a dialog definition and so on. In the parser we represent that as a union:

%union { 
        struct control_def {
                char window_text[256];
                int id;
                char *class;
                int x, y, width, height;
                int ctrl_style;
        } ctrl;

        struct menu_item_def { 
                char text[256];
                int identifier;
        } item;

        struct menu_def { 
                int identiifer;
                struct menu_item_def items[256];
        } mnu;

        struct font_def { 
                int size;
                char filename[256];
        } font;

        struct dialog_def { 
                char caption[256];
                int id;
                int x, y, width, height;
                int style;
                struct menu_def *mnu;
                struct control_def ctrls[256];
                struct font_def font;
        } dlg;

        int value;
        char text[256];
};

Then we specify the type that will be produced by parsing a particular type of expression. For example, a font definition in the file becomes a font member of the union:

%type <font> font

Just to clarify, the <font> part refers to the union member that's produced and the second "font" refers to a parser rule that will yield a result of that type. Here's the rule for this particular case:

font: T_FONT T_NUMBER "," T_STRING { 
    $$.size = $2; 
    strcpy($$.filename,$4); 
};

Yes, in theory we could use a struct instead of a union here -- but beyond wasting memory, it just doesn't make sense. A font definition in the file only defines a font. It would make no sense to have it produce a struct that included a menu definition, icon definition, number, string, etc. in addition to the font it actually defines. [end of edit]

Of course, using unions to save memory is rarely very important anymore. While it may generally seem rather trivial now, back when 64 Kb of RAM was a lot, the memory savings meant a lot more.

Merrow answered 12/8, 2010 at 3:45 Comment(4)
Sorry, Jerry, I did not understand your example. Could you please give a concrete example for how the union you defined is related to expressions it supposedly represent?Recountal
@ysap: I was afraid of that -- I'll try to write up a small grammar to show more about how it works.Merrow
thanks for clarifying the usecase. I think I understood the message. You basically use the union as a generic container. Then, the type of the object actually occupies the union is being stored in a tag (I assume it is the value or text member/s). So, like other examples in other answers, this is basically an explicit implementation of polymorphism.Recountal
Actually no, it doesn't use a tag -- but the parser generator keeps track of the <type> tags you've given, and it'll give an error if you try to mix up members (e.g., since the font rule produces a font result, it can only assign to the font member of the union). Yes, I guess you could view it as vaguely similar to polymorphism, but not really exactly.Merrow
L
1

One way to use unions that I came across it do data hiding.

Say you have a struct that is the buffer

then by allowing union on the struct in some modules you can access the contents of the buffer in different ways or not at all depending on the union declared in that particular module.

EDIT: here's an example

struct X
{
  int a;
};

struct Y
{
  int b;
};

union Public
{
   struct X x;
   struct Y y;
};

here whoever uses union XY can cast XY to either struct X or Y

so given a function:

void foo(Public* arg)
{   
...

you can access both struct X or struct Y

but then you want to limit the access so that user doesn't know about X

the union name stays the same but the struct X part is not available (through header)

void foo(Public* arg)
{
   // Public is still available but struct X is gone, 
   // user can only cast to struct Y

   struct Y* p = (struct Y*)arg;
...
Lankton answered 12/8, 2010 at 3:20 Comment(4)
::blink:: That's clever, but...it's also security through obscurity, so probably not a great idea. Also sensitive to packing issues, so implementation dependent.Belindabelisarius
I'm not sure I see your point. Can you give an example for the definition of such buffer?Recountal
@dmckee - you approved my suspicion that this usecase IS implementation specific!Recountal
@dmckee, proper encapsulation is not "security through obscurity". Obviously a malicious caller could even do something like *(char *)rand() = rand(); in C, so there's no security benefit to hiding your implementation. On the other hand, hiding implementation details does strongly discourage other people using your code from poking at the internals in ways that will break when you later tweak the implementation.Maganmagana
J
0

Consider a Hardware control Register with different bit fields. By setting values in these bit fields of registers, we can control different functionality of the register.

By using Union Data type, Either We can modify the entire content of the register or a particular bit field of the register.

For Ex: Consider a union data type as follows,

/* Data1 Bit Defintion */
typedef union 
{
    struct STRUCT_REG_DATA
    {
        unsigned int u32_BitField1  : 3;
        unsigned int u32_BitField2  : 2;
        unsigned int u32_BitField3  : 1;
        unsigned int u32_BitField4  : 2;                
    } st_RegData;

    unsigned int u32_RegData;

} UNION_REG_DATA;

To modify the entire Content of the register,

UNION_REG_DATA  un_RegData;
un_RegData. u32_RegData = 0x77;

To modify the single bit field content( For Ex Bitfield3 )

un_RegData.st_RegData.u32_BitField3 = 1;

Both reflect in same memory. Then this value can be written into the value of hardware control register.

Jounce answered 13/8, 2010 at 6:40 Comment(1)
@barati21 - That exactly is the point of the question - you should not do that if you want to guaranty portability. The underlying layout of union members in the memory is not defined in the standard. It is implementation defined.Recountal
F
0

Here is a practical example:

There are microcontrollers which their non volatile memories stores data in byte blocks. How could you easily store an array of floats in that memories? We know in C floats are 32bits (4bytes) long, so:

union float_uint8
{
    uint8 i[KNFLOATS*4]; //or KNFLOATS*sizeof(float)
    float f[KNFLOATS];
};

Now you could store/address floats with variables/pointers of type float_uint8 and with a loop, you can easily store them in memory as decomposed bytes without making any conversion or decomposition. And the same story repeats when reading the memory. Even you don't need to know how floats are decomposed in bytes to store or recover the data stored in memory.

This example is extracted from my own work. So yes, they are useful.

Fragment answered 26/8, 2018 at 15:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.