How to create type safe enums?
Asked Answered
S

4

60

To achieve type safety with enums in C is problematic, since they are essentially just integers. And enumeration constants are in fact defined to be of type int by the standard.

To achieve a bit of type safety I do tricks with pointers like this:

typedef enum
{
  BLUE,
  RED
} color_t;

void color_assign (color_t* var, color_t val) 
{ 
  *var = val; 
}

Because pointers have stricter type rules than values, so this prevents code such as this:

int x; 
color_assign(&x, BLUE); // compiler error

But it doesn't prevent code like this:

color_t color;
color_assign(&color, 123); // garbage value

This is because the enumeration constant is essentially just an int and can get implicitly assigned to an enumeration variable.

Is there a way to write such a function or macro color_assign, that can achieve complete type safety even for enumeration constants?

Spousal answered 27/3, 2017 at 9:54 Comment(2)
Please have a look at my answer below: #39725831Remindful
@Remindful It's quite similar to some of the struct versions posted below.Spousal
S
58

It is possible to achieve this with a few tricks. Given

typedef enum
{
  BLUE,
  RED
} color_t;

Then define a dummy union which won't be used by the caller, but contains members with the same names as the enumeration constants:

typedef union
{
  color_t BLUE;
  color_t RED;
} typesafe_color_t;

This is possible because enumeration constants and member/variable names reside in different namespaces.

Then make some function-like macros:

#define c_assign(var, val) (var) = (typesafe_color_t){ .val = val }.val
#define color_assign(var, val) _Generic((var), color_t: c_assign(var, val))

These macros are then called like this:

color_t color;
color_assign(color, BLUE); 

Explanation:

  • The C11 _Generic keyword ensures that the enumeration variable is of the correct type. However, this can't be used on the enumeration constant BLUE because it is of type int.
  • Therefore the helper macro c_assign creates a temporary instance of the dummy union, where the designated initializer syntax is used to assign the value BLUE to a union member named BLUE. If no such member exists, the code won't compile.
  • The union member of the corresponding type is then copied into the enum variable.

We actually don't need the helper macro, I just split the expression for readability. It works just as fine to write

#define color_assign(var, val) _Generic((var), \
color_t: (var) = (typesafe_color_t){ .val = val }.val )

Examples:

color_t color; 
color_assign(color, BLUE);// ok
color_assign(color, RED); // ok

color_assign(color, 0);   // compiler error 

int x;
color_assign(x, BLUE);    // compiler error

typedef enum { foo } bar;
color_assign(color, foo); // compiler error
color_assign(bar, BLUE);  // compiler error

EDIT

Obviously the above doesn't prevent the caller from simply typing color = garbage;. If you wish to entirely block the possibility of using such assignment of the enum, you can put it in a struct and use the standard procedure of private encapsulation with "opaque type":

color.h

#include <stdlib.h>

typedef enum
{
  BLUE,
  RED
} color_t;

typedef union
{
  color_t BLUE;
  color_t RED;
} typesafe_color_t;

typedef struct col_t col_t; // opaque type

col_t* col_alloc (void);
void   col_free (col_t* col);

void col_assign (col_t* col, color_t color);

#define color_assign(var, val)   \
  _Generic( (var),               \
    col_t*: col_assign((var), (typesafe_color_t){ .val = val }.val) \
  )

color.c

#include "color.h"

struct col_t
{
  color_t color;
};

col_t* col_alloc (void) 
{ 
  return malloc(sizeof(col_t)); // (needs proper error handling)
}

void col_free (col_t* col)
{
  free(col);
}

void col_assign (col_t* col, color_t color)
{
  col->color = color;
}

main.c

col_t* color;
color = col_alloc();

color_assign(color, BLUE); 

col_free(color);
Spousal answered 27/3, 2017 at 9:54 Comment(7)
This is really cute, although it won't catch some mistakes: int zonk(int x) {color_t color; color = x; return color;}Selfrespect
@Selfrespect You will obviously have to disallow direct assignments. That can achieved by for example embedding the enum in a struct and then make the struct an opaque type.Spousal
@Selfrespect I added an example with private encapsulation which blocks direct assignments.Spousal
Am I missing something, or is it impossible to use color_assign with a value from a variable, typed or otherwise? Since the macro also uses the "expression" as the field name. How do you actually do anything with these values in a type-safe way?Inclusive
@Leushenko Yes it is impossible because of the (typesafe_color_t){ .val = val }. If you type color_assign(color, 0); then it will expand to (typesafe_color_t){ .0 = 0 } which is gibberish and won't compile.Spousal
What does the function-like macro #define c_assign(var, val) (var) = (typesafe_color_t){ .val = val }.val do? Can you explain it in a bit more detail?Melitamelitopol
@Melitamelitopol It creates a temporary variable of the union type through a so-called compound literal (C99 feature), then initializes a specific member of this temporary union variable (through designated initializers, another C99 feature). In no member with a matching name exists in the union, then the code won't compile. If the member matches, as in the case with RED, the union member RED will get assigned the value RED. By typing .val in the end, the code accesses that very member and copies it into the destination variable. In practice I believe most of this code will get optimized away.Spousal
D
9

The top answer's pretty good, but it has the downsides that it requires a lot of the C99 and C11 feature set in order to compile, and on top of that, it makes assignment pretty unnatural: You have to use a magic color_assign() function or macro in order to move data around instead of the standard = operator.

(Admittedly, the question explicitly asked about how to write color_assign(), but if you look at the question more broadly, it's really about how to change your code to get type-safety with some form of enumerated constants, and I'd consider not needing color_assign() in the first place to get type-safety to be fair game for the answer.)

Pointers are among the few shapes that C treats as type-safe, so they make a natural candidate for solving this problem. So I'd attack it this way: Rather than using an enum, I'd sacrifice a little memory to be able to have unique, predictable pointer values, and then use some really hokey funky #define statements to construct my "enum" (yes, I know macros pollute the macro namespace, but enum pollutes the compiler's global namespace, so I consider it close to an even trade):

color.h:

typedef struct color_struct_t *color_t;

struct color_struct_t { char dummy; };

extern struct color_struct_t color_dummy_array[];

#define UNIQUE_COLOR(value) \
    (&color_dummy_array[value])

#define RED    UNIQUE_COLOR(0)
#define GREEN  UNIQUE_COLOR(1)
#define BLUE   UNIQUE_COLOR(2)

enum { MAX_COLOR_VALUE = 2 };

This does, of course, require that you have just enough memory reserved somewhere to ensure nothing else can ever take on those pointer values:

color.c:

#include "color.h"

/* This never actually gets used, but we need to declare enough space in the
 * BSS so that the pointer values can be unique and not accidentally reused
 * by anything else. */
struct color_struct_t color_dummy_array[MAX_COLOR_VALUE + 1];

But from the consumer's perspective, this is all hidden: color_t is very nearly an opaque object. You can't assign anything to it other than valid color_t values and NULL:

user.c:

#include <stddef.h>
#include "color.h"

void foo(void)
{
    color_t color = RED;    /* OK */
    color_t color = GREEN;  /* OK */
    color_t color = NULL;   /* OK */
    color_t color = 27;     /* Error/warning */
}

This works well in most cases, but it does have the problem of not working in switch statements; you can't switch on a pointer (which is a shame). But if you're willing to add one more macro to make switching possible, you can arrive at something that's "good enough":

color.h:

...

#define COLOR_NUMBER(c) \
    ((c) - color_dummy_array)

user.c:

...

void bar(color_t c)
{
    switch (COLOR_NUMBER(c)) {
        case COLOR_NUMBER(RED):
            break;
        case COLOR_NUMBER(GREEN):
            break;
        case COLOR_NUMBER(BLUE):
            break;
    }
}

Is this a good solution? I wouldn't call it great, since it both wastes some memory and pollutes the macro namespace, and it doesn't let you use enum to automatically assign your color values, but it is another way to solve the problem that results in somewhat more natural usages, and unlike the top answer, it works all the way back to C89.

Dode answered 27/3, 2017 at 15:56 Comment(8)
Using C11 features is not a legitimate downside.Inclusive
It is a downside if your compiler doesn't support C11 features. I won't name any names (coughMicrosoftcough) but there are a number of "C" compilers out there that can't handle C11.Dode
Interesting idea. You should consider hiding away the struct definition entirely though, so nobody gets the idea of using it or accessing the members. This can be done with opaque types as shown with the edit in my answer. Also, if you declare the struct const you don't waste space in .bss but rather in .rodata or something like that. And what about color_t color = 0;? Or worse: any expression that evaluates to 0 in compile time.Spousal
const is definitely a good idea; putting it in the text/read-only segments is worth a little more effort. That said, in similar scenarios, I've often named the internal property opaque_, which certainly isn't foolproof, but which has been more than good enough in the past to keep dirty hands out of the cookie jar.Dode
As for the zero issue, that's an issue, to be sure, but it's an issue that's shared with every other pointer type all over C: Yes, you can write color = 0, but it's not the same as color = BLACK, any more than string = 0 is the same as string = "". A switch statement can even identify NULL and either handle it in its default case, or even have a special case for NULL itself. You end up with an enum that effectively has an additional non-value, but considering how many real-world enums have something like DEFAULT = 0 already, I don't consider that a detriment to the technique.Dode
The problem here is that you hide away the pointer behind a typedef, so that the user might not realize it is a pointer. If they think it is an enum and the first color in the enum is RED, it makes sense to write color = 0, or more likely color = some_int_index;. Turns far worse if non-standard compilers are used - there are plenty of bad compilers (such as a certain C89-cough-cough compiler) that allow implicit assignment from integers to pointers. Then incorrect assignments would be tolerated, both at compile-time and in run-time.Spousal
It's true that you'd get weird behavior if you just assume that BLACK = 0 or something like it. But you get weird behavior in nearly every API if you make wild assumptions about it. Bad old compilers that let you write color = 5 will certainly make the issue worse — but there's one mitigating factor here, which is that 5 won't map to anything. The pointers are up in the text or data segment somewhere, so even if the compiler lets you write 5, no if or switch statements will match it. Even if your bad old compiler can't save you, the API will at least treat your 5 as garbage.Dode
And, of course, if the API treats your inputs as garbage, that does result in StackOverflow questions like "How come no matter which color I choose, it always draws black pixels?" But those are really easy questions to answer :-)Dode
I
8

One could enforce type safety with a struct:

struct color { enum { THE_COLOR_BLUE, THE_COLOR_RED } value; };
const struct color BLUE = { THE_COLOR_BLUE };
const struct color RED  = { THE_COLOR_RED  };

Since color is just a wrapped integer, it can be passed by value or by pointer as one would do with an int. With this definition of color, color_assign(&val, 3); fails to compile with:

error: incompatible type for argument 2 of 'color_assign'

     color_assign(&val, 3);
                        ^

Full (working) example:

struct color { enum { THE_COLOR_BLUE, THE_COLOR_RED } value; };
const struct color BLUE = { THE_COLOR_BLUE };
const struct color RED  = { THE_COLOR_RED  };

void color_assign (struct color* var, struct color val) 
{ 
  var->value = val.value; 
}

const char* color_name(struct color val)
{
  switch (val.value)
  {
    case THE_COLOR_BLUE: return "BLUE";
    case THE_COLOR_RED:  return "RED";
    default:             return "?";
  }
}

int main(void)
{
  struct color val;
  color_assign(&val, BLUE);
  printf("color name: %s\n", color_name(val)); // prints "BLUE"
}

Play with in online (demo).

Ignorance answered 27/3, 2017 at 10:57 Comment(3)
I believe this forces you to use different names for the enums and the const structs though.Spousal
@Spousal It does: each color get a private (or internal) name (enum) and a public one (const struct). I don't see it as a drawback.Ignorance
I really like your solution But I think it can be done even more simple, because the assignment operator works with structs. This means that you could get rid of the color_assign() function completely, and perform a "right to the point" hardcore struct color mycol; mycol=BLUE; (or am I missing something and some safety would be lost by going that way?)Multifoil
C
7

Ultimately, what you want is a warning or error when you use an invalid enumeration value.

As you say, the C language cannot do this. However you can easily use a static analysis tool to catch this problem - Clang is the obvious free one, but there are plenty of others. Regardless of whether the language is type-safe, static analysis can detect and report the problem. Typically a static analysis tool puts up warnings, not errors, but you can easily have the static analysis tool report an error instead of a warning, and change your makefile or build project to handle this.

Cesaria answered 27/3, 2017 at 12:18 Comment(5)
Obviously static analysers is always an option, for example a MISRA-C:2012 checker would catch enum type issues. The main problem with all static analysers on the market is that they are so full of bugs/"false positives", that they are not very useful. If you can force a compiler diagnostic by any standard C compiler, that's always the preferred solution.Spousal
@Spousal My experience of static analysis isn't that it's full of bugs, but that idiomatic C will frequently break coding standards - "if(ptr)" as a check for non-NULL, for example. Much of the effort of static analysis does have to go into refining your ruleset. OTOH, once you've done that, then you have a very powerful tool which really will improve your code.Cesaria
@Spousal Adding redundant functions and macros to code seems to be increasing complexity, ultimately reducing the code quality. The time spent implementing and reworking previous code IMHO is better spent using the static analysis tools.Forego
@Cesaria if(ptr) is rather sloppy but wide-spread practice than idiomatic, which would be if(ptr != NULL). Anyway, this isn't why most static analysers are bad, but rather scenarios such as type x; type_init(&x); And then you get "warning! x is not initialized when passed to the function!". Yes... thanks for letting me know that my variable isn't initialized, before it is initialized. As in, a failure to properly analyse across translation units.Spousal
@B.Wolf Ideally you will have multiple ways of bug prevention. If you have compile-time assertion and manual code review and static analysis, you improve code quality much more than if you don't have all of those.Spousal

© 2022 - 2024 — McMap. All rights reserved.