Can a macro be defined to provide function-like sequencing?
Asked Answered
H

1

10

Consider the following C code:

const int array[100];
int i = 0;
int x;

int get_next(void) {
    return array[i++];
}

int foo(void) {
    return get_next() + get_next();
}

Assuming that i == 0 when foo() is called, its effects are well-defined, as I understand it: when it returns, i has the value 2, and the return value is array[0] + array[1]. One of the two calls to get_next() will return array[0] and the other will return array[1]; it is unspecified which is which, but since the + operator is commutative, it doesn't matter.

In particular, with reference to C17 6.5p2

If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. If there are multiple allowable orderings of the subexpressions of an expression, the behavior is undefined if such an unsequenced side effect occurs in any of the orderings.

the above code does not suffer UB, since the return array[i++]; in each execution of get_next() is a full expression, and therefore there is a sequence point between them.

Now suppose instead that, for some reason, I wish to define get_next() as a function-like macro, without involving any actual function calls. Can this be done in standard C while preserving the above semantics, and in particular, ensuring that get_next() + get_next() remains free of UB?

  1. The naive definition #define get_next() (array[i++]) does not work, because get_next() + get_next() expands to (array[i++]) + (array[i++]) which clearly violates 6.5p2. There is no sequence point between the two i++ side effects, nor between the i++ side effect on the left and the value computation of i on the right or vice versa, and so the behavior is undefined. This is very similar to the examples discussed at Why are these constructs using pre and post-increment undefined behavior?.

  2. Eric Postpischil suggested in a comment that we could do #define get_next() (0, array[i++]) but I don't think that works either. Now get_next() + get_next() expands to (0, array[i++]) + (0, array[i++]), and although there is a sequence point between each 0 value computation and the corresponding i++ side effect, there still isn't any sequence point between the two i++ side effects, nor between one i++ side effect and the opposite i value computation.

  3. We could of course define get_next() to involve some call to an actual function (possibly inline) in which the i++ is performed, but as mentioned, I want to know if this can be avoided.

  4. We could also use language extensions like gcc's ({ ... }) but I want to stick to strict ISO C.

Are there any other options that I have missed?

As this is a question, I would appreciate if any proposed solution could be accompanied by a careful justification that its behavior is well-defined by the Standard.


This question is motivated by the observation that with the traditional macro definition of getchar() as something like

#define getchar() (stdin->i == stdin->buflen ? __refill_buffer(stdin) : stdin->buf[stdin->i++])

then evaluating getchar() + getchar() would cause UB unless the compiler took special steps to avoid it. (Note that the ?: doesn't help with the sequencing between the two invocations, no more than (0, array[i++]) did.) I think that such an implementation is still conforming; although 7.1.4p3 guarantees a sequence point before a library function returns, this is vitiated by 7.1.4p1, which allows any library function to be defined as a macro, and footnote 190, which emphasizes that "Such macros might not contain the sequence points that the corresponding function calls do."

To avoid potential UB, the portable application programmer would have to #undef getchar or write (getchar)() + (getchar)() to suppress the macro definition and get the function. Perhaps more alarmingly, the same seems to be true of expressions like sqrt(2.0) + sqrt(3.0), and it makes me start to wonder if this is a defect.

But for now, I am curious if there is any "safe" way for a library implementer to modify the traditional macro implementation of getchar() and company, such that getchar() + getchar() can become well-defined, without having to extend the compiler itself beyond what ISO C promises.


Edit: Here's a candidate, though it's absolutely awful, so have your barf bags ready. A full declarator for a variably modified type is a full expression and thus is followed by a sequence point. So what about

#define get_next() (array[sizeof(char[i++])])

Does the sequence point follow the side effect as well as the value computation? gcc complains but clang doesn't.

Haywood answered 31/5, 2024 at 3:20 Comment(15)
The answer is no, because a macro definition is nothing more than text substitution before the language gets to begin its sequencing. The only way is as you already observed: to redefine those macros to reference a function, for which the sequencing is defined. You can get much the same effect by declaring the functions inline and directing the compiler to obey that directive.Hardworking
There is a sequence point when the arguments (if any) to the function have been evaluated and the function designator has been evaluated, but before the function call takes place; there is another sequence point before the function returns. For most practical purposes, you can't simulate those sequence points reliably with a macro.Lexicostatistics
Why not use static inline int get_next(void) { … }?Lexicostatistics
Yeah, I certainly could; that's #3 on my list. But as I mentioned at the end, even in this day and age, many standard libraries still implement things as macros, and I was wondering if there's a "minimally invasive" way they could fix the sequencing issue without having to argue about that design decision.Haywood
@Dúthomhas: Well, the nature of text substitution doesn't make it inherently impossible; there certainly could exist a syntax that makes this possible, like gcc's ({ ... }) extension which I mentioned. So I wondered if perhaps there was some clever way to get it with standard C features.Haywood
No we can't, it will not pass code review.Peyton
@n.m.couldbeanAI: I didn't ask if there was an option that would pass code review - I asked if there was an option :) I'm frankly just as interested in obscure properties of the language as in practical solutions. (Speaking of code review, will you start rejecting code like sqrt(2.0) + sqrt(3.0)? Because it seems to me that you should.)Haywood
I think the declarator method is the only way to do it. Examination of all the operators shows only limited sequencing. There is the operand sequencing in the comma operator, which we already know is insufficient, and the sequencing in the conditional and Boolean operators is similarly insufficient. Sequencing in various aspects of a function call is either insufficient or fails to satisfy the desire not to have a function call.Guelph
An option that wouldn't pass code review is an oxymoron. There is no such thing. Either it passes, or it's not an option. sqrt(2.0) + sqrt(3.0) is OK. It is clear as a day what the author intended and there is no simpler way to express it. Why would I reject it?Peyton
@n.m.couldbeanAI: Because I believe that it potentially causes undefined behavior, as I mentioned at the end of my question.Haywood
There is no potential UB in this particular expression, regardless of whether sqrt is implemented as a macro. If it's sqrt(anything with side effects) then it's an instant no, even if it's a full expression. PS I think I would probably ask to rewrite get_next() + get_next() too.Peyton
getchar() or get_next()? In case of getchar() specifically it could perhaps be implemented as some flavour of scanf("%c") and then there's a special loop hole: there is a sequence point "after the actions associated with each formatted input/output function conversion specifier".Farwell
Btw isn't sizeof(char[i++]) invalid when i is zero? There was some other question recently about VLA evaluation to zero in run-time. I wouldn't expect a compiler to do anything well-defined here: it may not need to produce a diagnostic, but it need not produce well-defined behavior either.Farwell
Since the definition const int array[100]; is at file scope, every element of the array will be zero for all time — so the result of any (valid) array access will be zero. I assume that the 'real life' scenario this is based on doesn't have this feature. If it did, you'd only have to worry about incrementing i — and (i += 2, 0) would do the job of two invocations of get_next(). I assume the 'real life' code also takes steps somewhere along the line to ensure no out-of-bounds access to the array.Lexicostatistics
@Lundin: to avoid the potential UB for i == 0, one could write #define get_next() (array[sizeof(char[++i]) - 1])Brigitte
B
2

To sequence the i++ operations, you can try and separate them from the pointer dereference operation:

#define get_next()  (i++, array[i - 1])

or more obscure:

#define get_next()  array[i++, i - 1]

But as commented by @NateEldredge and @Lundin, this does not remove UB because both extressions i++ and i - 1 from one expansion of get_next() are not sequenced with respect to the same expressions from the other expansion so UB is still present.

This macro is fragile as it references identifiers i and array which may themselves be macros with side effects, hence potentially producing a different value than array[i++].

Brigitte answered 31/5, 2024 at 4:32 Comment(2)
I really do want to avoid the UB, and I don't see how this could fix it, not any more than (0, array[i++]) would.Haywood
(i++, array[i - 1]) + (i++, array[i - 1]) is very much UB in multiple ways.Farwell

© 2022 - 2025 — McMap. All rights reserved.