A proper way of associating enums with strings
Asked Answered
Z

3

20

Let's say I have a number of strings I use often throughout my program (to store state and things like that). String operations can be expensive, so whenever addressing them I'd like to use an enumeration. I've seen a couple solutions so far:

typedef enum {
    STRING_HELLO = 0,
    STRING_WORLD
} string_enum_type;

// Must be in sync with string_enum_type
const char *string_enumerations[] = {
    "Hello",
    "World"
}

The other one I encounter quite often:

typedef enum {
    STRING_HELLO,
    STRING_WORLD
} string_enum_type;

const char *string_enumerations[] = {
    [STRING_HELLO] = "Hello",
    [STRING_WORLD] = "World"
}

What are cons/pros of these two methods? Is there a better one?

Zetana answered 26/2, 2018 at 10:39 Comment(7)
The second, it's better as it's independent of the enum value. Which means that changes to the enum will not require you to match the indexing of the secondBonehead
Yiu want to operate on state handles. I recommend using pointers to static state description data as state handles. A state description could be a string, but a struct that stores state name and other relevant info is probanly better. You only need to compare pointers, not strings themselves.Xerophthalmia
The second method is not only better, but the first method is downright dangerous because the enum and the strings can easily go out on sync.Bushweller
@MichaelWalz - they can also get out of sync with the second method (e.g. if an enum value is not used as a designator, or the array initialisation includes additional elements). Admittedly there is more of a visual cue for the programmer when that happens.Supranatural
[STRING_WORLD] = "World" is going to waste a lot of space should someone set STRING_WORLD = INT_MAX in the enum definition...Dunkle
True story: A few months ago I was in a hurry and so chose the the first method in a program I'm writing at work. Twice since then I've managed to add items to the enum, but forgot to add them to the string table. In both cases this led to massive confusion which cost me significant time. My conclusion: don't use the first method.Phobia
Possible duplicate of Translate error codes to string to displayConfessedly
B
14

The only advantage with the former is that it's backwards-compatible with ancient C standards.

Apart from that, the latter alternative is superior, as it ensures data integrity even if the enum is modified or items change places. However, it should be completed with a check to ensure that the number of items in the enum corresponds with the number of items in the look-up table:

typedef enum {
    STRING_HELLO,
    STRING_WORLD,
    STRING_N  // counter
} string_enum_type;

const char *string_enumerations[] = {
    [STRING_HELLO] = "Hello",
    [STRING_WORLD] = "World"
};

_Static_assert(sizeof string_enumerations/sizeof *string_enumerations == STRING_N,
               "string_enum_type does not match string_enumerations");

The above is the best method for a simple "enum - lookup table" coupling. Another option would be to use structs, but that's more suitable for more complex data types.


And finally, more as a side-note, the 3rd version would be to use "X macros". This is not recommended unless you have specialized requirements regarding code repetition and maintenance. I'll include it here for completeness, but I don't recommend it in the general case:

#define STRING_LIST          \
 /* index         str    */  \
  X(STRING_HELLO, "Hello")   \
  X(STRING_WORLD, "World")


typedef enum {
  #define X(index, str) index,
    STRING_LIST
  #undef X
  STRING_N // counter
} string_enum_type;


const char *string_enumerations[] = {
  #define X(index, str) [index] = str,
    STRING_LIST
  #undef X
};

_Static_assert(sizeof string_enumerations/sizeof *string_enumerations == STRING_N,
               "string_enum_type does not match string_enumerations");
Balneal answered 26/2, 2018 at 12:23 Comment(9)
"However, it should be completed with a check to ensure that the number of items in the enum corresponds with the number of items in the look-up table:" -- Perhaps worth noting is that this check is slightly more reliable with the OP's first approach. With the second, if an initialiser other than the last is missing, it will be undetectable by this check.Quebec
@hvd True, neither version is fool proof. For example, if someone assigns a value explicitly to one enum item (other than 0 for the first one), then both versions fail. To protect against the issue you mention, I suppose one could add a run-time check such as for(size_t i=0; i<sizeof string_enumerations/sizeof *string_enumerations; i++) { if(string_enumerations[i] == NULL){ halt_and_catch_fire(); } }. Or use a NULL sentinel value and expect that one to be present at index STRING_N but not earlier.Balneal
Instead of const char *string_enumerations[] and a _Static_assert check, why not force the array size with const char *string_enumerations[STRING_N]?Purposeless
@chux Because it doesn't add anything. C can protect against an initializer list that's too large, but not against one that's too small. So if you set the fixed size, but forget one array initializer, the program would compile cleanly. Although I guess leaving out the array size makes more sense in the version without designated initializers, as the size of the array depends completely on the amount of initializers used.Balneal
"forget one array initializer" is not prevented much in this code either. const char *string_enumerations[] = { [42] = "Hello", }; still makes for an array size of 43. One element pointers to a string and the other have a pointers of 0/NULL.Purposeless
So " array depends completely on the amount of initializers used" is more like the array element count is the greatest enum used + 1.Purposeless
A completely different compile-time check can be obtained, under some compilers at least, by using a function and a switch statement, as in my answer. (Whether such a function is appropriate to the original question is another story.)Phobia
@SteveSummit Such a switch will get optimized to some manner of look-up table anyhow. The gcc warning will only come if you omit the default, which is a bad thing to do, because that will also eliminate the out-of-bounds checking. Suppose for example that the programmer by accident passes an enumeration constant belonging to another enum. C has no built-in type safety for enums.Balneal
@Balneal Having the switch turn into a lookup table is fine. (That's kind of the point!) But you make a very good observation about the potentially confounding influence of a default case. See my answer for a note on that.Phobia
P
3

Another possibility might be to use a function, instead of an array:

const char *enumtostring(string_enum_type e) {
    switch(e) {
        case STRING_HELLO: return "hello";
        case STRING_WORLD: return "world";
    }
}

gcc, at least, will warn if you add an enum value but forget to add the matching switch case.

(I suppose you could try making this sort of function inline, as well.)


Addendum: The gcc warning I mentioned applies only if the switch statement does not have a default case. So if you want to print something for out-of-bounds values that somehow creep through, you could do that, not with a default case, but with something like this:

const char *enumtostring(string_enum_type e) {
    switch(e) {
        case STRING_HELLO: return "hello";
        case STRING_WORLD: return "world";
    }
    return "(unrecognized string_enum_type value)";
}

It's also nice to include the out-of-bounds value:

    static char tmpbuf[50];
    snprintf(tmpbuf, sizeof(tmpbuf), "(unrecognized string_enum_type value %d)", e);
    return tmpbuf;

(This last fragment has a couple of additional limitations, but this addendum is getting long already, so I won't belabor the point with them just now.)

Phobia answered 26/2, 2018 at 12:48 Comment(4)
This code can be improved significantly by changing the parameter to (const string_enum_type* e) and then switch(*e). That way you guarantee that an enum of correct type is passed, and not just any random int. That way, you outsource the out-of-bounds check to the caller.Balneal
@Balneal Interesting idea, although I'm not sure even that would be sufficient. I've added a note to the answer to cover the out-of-bounds case.Phobia
It would be enough to protect against the from enumerated type, as long as it is passed with a pointer. See How to create type safe enums? for various tips & tricks on that topic.Balneal
@Balneal As I said, interesting idea, and this isn't the place for a long discussion on this, but the pointer technique by itself doesn't even protect against string_enum_type x = 42; const char *p = enumptostring(&x), let alone more exotic transgressions. (Yes, I see all the valiant attempts at protection in the linked thread, but my point is that an author of enumtostring() can't necessarily depend on all that.)Phobia
N
0

Another possibility is to user #defines.

In spite of the many cons of its use, the main benefit is that #defines take up no space unless they are used...

#define STRING_HELLO "Hello"
#define STRING_WORLD "World"
Nu answered 26/2, 2018 at 11:10 Comment(3)
And how exactly do you asociate enum number STRING_HELLO with the string "Hello" here? How do you iterate across the different alternatives?Balneal
@Balneal I was just giving another alternative without using enums. You're right about the flaw when iteratingNu
Not really an alternative that works for OP; as stated he wants the program to avoid using strings, not just the programmer.Brook

© 2022 - 2024 — McMap. All rights reserved.