What are the common undefined/unspecified behavior for C that you run into? [closed]
Asked Answered
D

11

73

An example of unspecified behavior in the C language is the order of evaluation of arguments to a function. It might be left to right or right to left, you just don't know. This would affect how foo(c++, c) or foo(++c, c) gets evaluated.

What other unspecified behavior is there that can surprise the unaware programmer?

Donor answered 19/9, 2008 at 0:30 Comment(1)
foo(c++, c) and foo(++c, c) are both undefined behavior, which completely trumps unspecified.Brynnbrynna
O
91

A language lawyer question. Hmkay.

My personal top3:

  1. violating the strict aliasing rule

  2. violating the strict aliasing rule

  3. violating the strict aliasing rule

    :-)

Edit Here is a little example that does it wrong twice:

(assume 32 bit ints and little endian)

float funky_float_abs (float a)
{
  unsigned int temp = *(unsigned int *)&a;
  temp &= 0x7fffffff;
  return *(float *)&temp;
}

That code tries to get the absolute value of a float by bit-twiddling with the sign bit directly in the representation of a float.

However, the result of creating a pointer to an object by casting from one type to another is not valid C. The compiler may assume that pointers to different types don't point to the same chunk of memory. This is true for all kind of pointers except void* and char* (sign-ness does not matter).

In the case above I do that twice. Once to get an int-alias for the float a, and once to convert the value back to float.

There are three valid ways to do the same.

Use a char or void pointer during the cast. These always alias to anything, so they are safe.

float funky_float_abs (float a)
{
  float temp_float = a;
  // valid, because it's a char pointer. These are special.
  unsigned char * temp = (unsigned char *)&temp_float;
  temp[3] &= 0x7f;
  return temp_float;
}

Use memcopy. Memcpy takes void pointers, so it will force aliasing as well.

float funky_float_abs (float a)
{
  int i;
  float result;
  memcpy (&i, &a, sizeof (int));
  i &= 0x7fffffff;
  memcpy (&result, &i, sizeof (int));
  return result;
}

The third valid way: use unions. This is explicitly not undefined since C99:

float funky_float_abs (float a)
{
  union 
  {
     unsigned int i;
     float f;
  } cast_helper;

  cast_helper.f = a;
  cast_helper.i &= 0x7fffffff;
  return cast_helper.f;
}
Opportune answered 19/9, 2008 at 0:33 Comment(5)
aehm. I mentioned that I assume 32 bit ints and little endian. Btw - the union usage is still undefined behaviour not because of the IEEE bit representation but simply because you are (in theory) not allowed to write into field f and read from field i.Opportune
onebyone, it's undefined behavior even if the implementation uses ieee. the point is it reads from a different member that was last written to.Roselani
csci.csusb.edu/dick/c++std/cd2/basic.html#basic.lval bullet 15 seems to imply that type punning through a union is safe. The wording in the c standard is identical.Mycetozoan
the C99 standard allows type punning through unions; see footnote 82, which was added with TC3: "If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation."Quadrifid
I don't understand what is special about char* and void*.(other than being byte aligned). Can't we rewrite the first funky function as below? float funky_float_abs (float a) { unsigned int * temp = (unsigned int) a; temp = (unsigned int*)((int)temp & 0x7fffffff); return (float)temp; }Poulterer
T
33

My personal favourite undefined behaviour is that if a non-empty source file doesn't end in a newline, behaviour is undefined.

I suspect it's true though that no compiler I will ever see has treated a source file differently according to whether or not it is newline terminated, other than to emit a warning. So it's not really something that will surprise unaware programmers, other than that they might be surprised by the warning.

So for genuine portability issues (which mostly are implementation-dependent rather than unspecified or undefined, but I think that falls into the spirit of the question):

  • char is not necessarily (un)signed.
  • int can be any size from 16 bits.
  • floats are not necessarily IEEE-formatted or conformant.
  • integer types are not necessarily two's complement, and integer arithmetic overflow causes undefined behaviour (modern hardware won't crash, but some compiler optimizations will result in behavior different from wraparound even though that's what the hardware does. For example if (x+1 < x) may be optimized as always false when x has signed type: see -fstrict-overflow option in GCC).
  • "/", "." and ".." in a #include have no defined meaning and can be treated differently by different compilers (this does actually vary, and if it goes wrong it will ruin your day).

Really serious ones that can be surprising even on the platform you developed on, because behaviour is only partially undefined / unspecified:

  • POSIX threading and the ANSI memory model. Concurrent access to memory is not as well defined as novices think. volatile doesn't do what novices think. Order of memory accesses is not as well defined as novices think. Accesses can be moved across memory barriers in certain directions. Memory cache coherency is not required.

  • Profiling code is not as easy as you think. If your test loop has no effect, the compiler can remove part or all of it. inline has no defined effect.

And, as I think Nils mentioned in passing:

  • VIOLATING THE STRICT ALIASING RULE.
Tram answered 19/9, 2008 at 1:18 Comment(3)
Steve - I encountered exactly what you described (the ending newline problem) in the early 90s w/ the Microtec compiler for the 68K family. I thought the tool was buggy, but I just added the newline "to work around the stupid tool". Unlike my overconfident co-worker (see my other comment on this subject), I wasn't so cocksure that I'd write a defect report... good thing I didn't.Linseed
Signed integer overflow being undefined isn't just pedantry; at least GCC applies optimizations on the assumption that it never happens, such as 'if (a + 1 > a)' always passing and never detecting wraparound.Worthwhile
@BCoates: I have no problem with integer overflows yielding partially-indeterminate values, which would be sufficient semantics to justify GCC's optimization in the indicated case. Unfortunately, some compiler writers seem to think integer overflow should negate laws of time and causality (time I could perhaps live with, if code got resequenced on the assumption that it wouldn't overflow; negation of causality should IMHO be regarded as insanity, but alas not everyone agrees.)Sophisticated
D
21

My favorite is this:

// what does this do?
x = x++;

To answer some comments, it is undefined behaviour according to the standard. Seeing this, the compiler is allowed to do anything up to and including format your hard drive. See for example this comment here. The point is not that you can see there is a possible reasonable expectation of some behaviour. Because of the C++ standard and the way the sequence points are defined, this line of code is actually undefined behaviour.

For example, if we had x = 1 before the line above, then what would the valid result be afterwards? Someone commented that it should be

x is incremented by 1

so we should see x == 2 afterwards. However this is not actually true, you will find some compilers that have x == 1 afterwards, or maybe even x == 3. You would have to look closely at the generated assembly to see why this might be, but the differences are due to the underlying problem. Essentially, I think this is because the compiler is allowed to evaluate the two assignments statements in any order it likes, so it could do the x++ first, or the x = first.

Durkee answered 19/9, 2008 at 0:33 Comment(9)
X is incremented by 1. You assigned x to itself and then incremmented it. it is equivalent to x++;Fifi
this does x = x; x += 1; so yeah, like charles graham says. I wouldn't call this unspecifiedMika
Modifying a variable more than once between two sequence points is explicitly stated as undefined behaviour in both standard C and C++.Backbreaker
I'm cracking up laughing right now at the thought of someone writing a C compiler that formats your hard drive upon seeing x = x++ because it's undefined in the standard :-)Tyrolienne
+1, especially for the "formatting hard drive part". Actually, for people who code like this, formatting the hard drive might save future generations of maintenance programmers a lot of grief...Afterglow
2 things: 1) it's absolutely undefined behavior; about 15 years ago I debated with someone in my group who wrote a defect report to the compiler vendor (yikes!) when he wrote this exact code (except he used "i" instead of "x") and "i" was stuck at 1; and 2) I laughed when I read the part about formatting the hard drive, probably because that's the kind of thing I'd say too.Linseed
I would say that x is incremented, then assigned with it's previous value, because x++ returns that and has precedence over assignment. But yeah it's undefined... As many things in the language (making a lot of headache...)Hourigan
On this tutorial page (bottom) it is clearly stated that ++ has precedence over = So what exactly is the Big Deal ? cplusplus.com/doc/tutorial/operatorsDeformed
@FredCailloux Your tutorial doesn't really go into enough detail on the problem of undefined behaviour caused by sequence point misuse. This seems to be a better reference en.cppreference.com/w/cpp/language/eval_orderDurkee
H
20

Dividing something by a pointer to something. Just won't compile for some reason... :-)

result = x/*y;
Hellbent answered 19/9, 2008 at 0:44 Comment(3)
Haha nice one, I am writing it down :-)Kindrakindred
because '/*' is threatened as a comment, just add a space between '/' and '*' them and it should works (at least it work on my gcc 8.1.1).Counsel
What is this for an unbelievable hilarious and incorrect answer? It doesn't answer the question at all and suggests wrong assumptions of C code. The given code is a syntax error. It has nothing to do with undefined behavior. - Maybe you meant to divide a value of a basic type by a pointer value, but that isn't what you have shown. To divide a value of a basic type by a dereferenced pointer isn't incorrect as well as for example: double x = 2; int z = 1, *y; y = &z; int result = x / *y; - This answer needs either to be drastically edited or urgently deleted. -1Rhizocarpous
C
11

Another issue I encountered (which is defined, but definitely unexpected).

char is evil.

  • signed or unsigned depending on what the compiler feels
  • not mandated as 8 bits
Countrywide answered 19/9, 2008 at 7:58 Comment(4)
Well, it's not evil if you use it for what it is meant for, i.e. for characters...Afterglow
Actually, there are three different types of char: char, unsigned char and signed char. They are explicitly distinct types.Intrusive
You must use (pointers to or arrays of plain) char when dealing with strings. Many standard library functions (like all the str*() functions) take pointers to char and giving them anything else requires ugly casts.Courier
Who said anything about strings ? Embedded programmers sometimes play with variable size for efficiency. Assuming anything about char doesn't work cross-platform. Calling library functions aimed at strings, but defined when a string was simply a char* and Unicode had not been invented may be OK, but if I'm going to be blunt... not writing programs to at least support unicode characters is stupidCountrywide
C
11

I can't count the number of times I've corrected printf format specifiers to match their argument. Any mismatch is undefined behavior.

  • No, you must not pass an int (or long) to %x - an unsigned int is required
  • No, you must not pass an unsigned int to %d - an int is required
  • No, you must not pass a size_t to %u or %d - use %zu
  • No, you must not print a pointer with %d or %x - use %p and cast to a void *
Courier answered 15/8, 2013 at 19:31 Comment(1)
The standard implies (in a non-normative footnote) that passing an int to %x, or an unsigned int to %d, is ok as long as the value is within the range of both types. Still, I prefer to avoid it.Rb
R
9

I've seen a lot of relatively inexperienced programmers bitten by multi-character constants.

This:

"x"

is a string literal (which is of type char[2] and decays to char* in most contexts).

This:

'x'

is an ordinary character constant (which, for historical reasons, is of type int).

This:

'xy'

is also a perfectly legal character constant, but its value (which is still of type int) is implementation-defined. It's a nearly useless language feature that serves mostly to cause confusion.

Rb answered 15/8, 2013 at 20:19 Comment(3)
It was useful when writing C on the Macintosh, which frequently used a 32-bit integer to hold four-character file types, application signatures, etc., though trigraphs would rather obnoxiously mess up '????'.Sophisticated
This is especially dangerous with overloaded functions that take in char* as well as char. I've seen many people bitten by it (example)Glitter
The question is about C, not C++. There are no overloaded functions.Rb
E
8

A compiler doesn't have to tell you that you're calling a function with the wrong number of parameters/wrong parameter types if the function prototype isn't available.

Enervate answered 19/9, 2008 at 2:40 Comment(2)
Yes. Benevolent compilers however will usually help you with a warning...Afterglow
As of C99, calling a function with no visible declaration requires a diagnostic. That declaration doesn't have to be a prototype (i.e., a declaration that specifies the types of the parameters), but it always should be. (Variadic functions like printf can still be problematic.)Rb
W
6

The clang developers posted some great examples a while back, in a post every C programmer should read. Some interesting ones not mentioned before:

  • Signed integer overflow - no it's not ok to wrap a signed variable past its max.
  • Dereferencing a NULL Pointer - yes this is undefined, and might be ignored, see part 2 of the link.
Washtub answered 1/2, 2012 at 22:18 Comment(0)
V
2

The EE's here just discovered that a>>-2 is a bit fraught.

I nodded and told them it was not natural.

Valiant answered 19/9, 2008 at 2:44 Comment(0)
N
1

Be sure to always initialize your variables before you use them! When I had just started with C, that caused me a number of headaches.

Neri answered 19/9, 2008 at 0:32 Comment(1)
Not always, it depends on what you do. But, yes, if you do not know what you are doing, you should initialize it. Moreover, the compiler would just optimize the code if it is redundant to initialize the variable.Curb

© 2022 - 2024 — McMap. All rights reserved.