Undefined behavior: when attempting to access the result of function call
Asked Answered
R

3

9

The following compiles and prints "string" as an output.

#include <stdio.h>

struct S { int x; char c[7]; };

struct S bar() {
    struct S s = {42, "string"};
    return s;
}

int main()
{
    printf("%s", bar().c);
}

Apparently this seems to invokes an undefined behavior according to

C99 6.5.2.2/5 If an attempt is made to modify the result of a function call or to access it after the next sequence point, the behavior is undefined.

I don't understand where it says about "next sequence point". What's going on here?

Rainmaker answered 7/12, 2012 at 1:36 Comment(3)
Where did you get that this is UB from?Lazaro
The next sequence point is at the end of the full expression, aka the semicolon (;). There's no undefined behaviour here.Horrorstruck
@Luchian: My doubt arose when I read or to access it after the next sequence point, the behavior is undefined How is it possible to access the result of function call after it's sequence point?Rainmaker
C
12

You've run into a subtle corner of the language.

An expression of array type is, in most contexts, implicitly converted to a pointer to the first element of the array object. The exceptions, none of which apply here, are:

  • When the array expression is the operand of a unary & operator (which yields the address of the entire array);
  • When it's the operand of a unary sizeof or (as of C11) _Alignof operator (sizeof arr yields the size of the array, not the size of a pointer); and
  • When it's a string literal in an initializer used to initialize an array object (char str[6] = "hello"; doesn't convert "hello" to a char*.)

(The N1570 draft incorrectly adds _Alignof to the list of exceptions. In fact, for reasons that are not clear, _Alignof can only be applied to a type name, not to an expression.)

Note that there's an implicit assumption: that the array expression refers to an array object in the first place. In most cases, it does (the simplest case is when the array expression is the name of a declared array object) -- but in this one case, there is no array object.

If a function returns a struct, the struct result is returned by value. In this case, the struct contains an array, giving us an array value with no corresponding array object, at least logically. So the array expression bar().c decays to a pointer to the first element of ... er, um, ... an array object that doesn't exist.

The 2011 ISO C standard addresses this by introducing "temporary lifetime", which applies only to "A non-lvalue expression with structure or union type, where the structure or union contains a member with array type" (N1570 6.2.4p8). Such an object may not be modified, and its lifetime ends at the end of the containing full expression or full declarator.

So as of C2011, your program's behavior is well defined. The printf call gets a pointer to the first element of an array that's part of a struct object with temporary lifetime; that object continues to exist until the printf call finishes.

But as of C99, the behavior is undefined -- not necessarily because of the clause you quote (as far as I can tell, there is no intervening sequence point), but because C99 doesn't define the array object that would be necessary for the printf to work.

If your goal is to get this program to work, rather than to understand why it might fail, you can store the result of the function call in an explicit object:

const struct s result = bar();
printf("%s", result.c);

Now you have a struct object with automatic, rather than temporary, storage duration, so it exists during and after the execution of the printf call.

Cordilleras answered 7/12, 2012 at 2:3 Comment(9)
Logically, I'd suggest that this is just a wording clarification, rather than a real change. I find it difficult to believe that the C99 Standard cannot be interpreted to say that bar().c is not an object.Etiquette
@DeadMG: I'm afraid you've lost me with the triple-negative. Are you saying that bar().c is an object in C99?Cordilleras
@Keith: the standard defines 'object' as "region of data storage in the execution environment, the contents of which can represent values". So I think the value returned from a function can still be considered an object - just one that cannot be modified and can only be accessed (only has a lifetime) until the next sequence point (per C99 6.5.2.2/5).Beery
@MichaelBurr: Sensible, but not what the C99 standard says. This: int func(void) { return 42; } does not result in an object of type int, only a value of type int. In the absence of a specific statement (like the one added in the 2011 standard), it's the same for a function returning a struct: calling the function gives you a value of the return type, not an object of the return type. C99 defines the lifetimes (storage duration) of objects in 6.2.4; it says nothing about objects whose lifetime ends at a sequence point. It's an omission in C99, corrected in C2011.Cordilleras
@Keith: with that line of thinking, printf("%d", bar().x); would also be undefined behavior since the . operator "designates a member of a structure or union object" (C99 6.5.2.3/3). That would imply that the Example 1 given in 6.5.2.3/6 is incorrect. So there's not just an omission in C99, but a related error in a non-normative example.Beery
@MichaelBurr: Interesting. Personally, I think that's an inconsistency in C99 (and in C90 and C2011). The following sentence strongly implies that the prefix needn't be an lvalue -- i.e., that it needn't designate an object. I think it should say that it "designates a member of a structure or union value".Cordilleras
@MichaelBurr: The C99 Rationale says "Since the language now permits structure parameters, structure assignment and functions returning structures, the concept of a structure expression is now part of the C language. This refers to a change made between K&R C and ANSI C89. My guess is that the committee just overlooked that clause.Cordilleras
The Committee overlooked a fair number of things, but most of them didn't matter until decades later, since programmers and compiler writers at the time both recognized the value in trying to interpret the Standard in such fashion as to resulting programming language as useful as possible. IMHO, the best way to handle this case would have been to say that compiler writers may, at their option, either refuse to accept attempts to explicitly or implicitly take the address of a returned structure or regard such a structure as an lvalue with extended lifetime. That would avoid imposing new...Proparoxytone
...obligations on compilers for platforms where the overlapping object lifetimes would pose difficulties, while allowing compilers which can handle such constructs to do so with code that's written to take advantage of them [which would not be possible if such code were considered a constraint violation]Proparoxytone
E
5

The sequence point occurs at the end of the full expression- i.e., when printf returns in this example. There are other cases where sequence points occur

Effectively, this rule states that function temporaries do not live beyond the next sequence point- which in this case, occurs well after it's use, so your program has quite well-defined behaviour.

Here's a simple example of not well-defined behaviour:

char* c = bar().c; *c = 5; // UB

Here, the sequence point is met after c is created, and the memory it points to is destroyed, but we then attempt to access c, resulting in UB.

Etiquette answered 7/12, 2012 at 1:50 Comment(0)
B
5

In C99 there is a sequence point at the call to a function, after the arguments have been evaluated (C99 6.5.2.2/10).

So, when bar().c is evaluated, it results in a pointer to the first element in the char c[7] array in the struct returned by bar(). However, that pointer gets copied into an argument (a nameless argument as it happens) to printf(), and by the time the call is actually made to the printf() function the sequence point mentioned above has occurred, so the member that the pointer was pointing to may no longer be alive.

As Keith Thomson mentions, C11 (and C++) make stronger guarantees about the lifetime of temporaries, so the behavior under those standards would not be undefined.

Beery answered 7/12, 2012 at 2:32 Comment(2)
So, would this be undefined behavior in C99 as well if I had int foo() { return 42; } int main() { printf("%d", foo()); } for example because the function foo returns an rvalue which may no longer be available after that sequence point?Rainmaker
@cpx: no, there's no undefined behavior there - a copy of the value returned by foo() is what gets passed as the argument to printf().Beery

© 2022 - 2024 — McMap. All rights reserved.