Consider the following C code:
const int array[100];
int i = 0;
int x;
int get_next(void) {
return array[i++];
}
int foo(void) {
return get_next() + get_next();
}
Assuming that i == 0
when foo()
is called, its effects are well-defined, as I understand it: when it returns, i
has the value 2
, and the return value is array[0] + array[1]
. One of the two calls to get_next()
will return array[0]
and the other will return array[1]
; it is unspecified which is which, but since the +
operator is commutative, it doesn't matter.
In particular, with reference to C17 6.5p2
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. If there are multiple allowable orderings of the subexpressions of an expression, the behavior is undefined if such an unsequenced side effect occurs in any of the orderings.
the above code does not suffer UB, since the return array[i++];
in each execution of get_next()
is a full expression, and therefore there is a sequence point between them.
Now suppose instead that, for some reason, I wish to define get_next()
as a function-like macro, without involving any actual function calls. Can this be done in standard C while preserving the above semantics, and in particular, ensuring that get_next() + get_next()
remains free of UB?
The naive definition
#define get_next() (array[i++])
does not work, becauseget_next() + get_next()
expands to(array[i++]) + (array[i++])
which clearly violates 6.5p2. There is no sequence point between the twoi++
side effects, nor between thei++
side effect on the left and the value computation ofi
on the right or vice versa, and so the behavior is undefined. This is very similar to the examples discussed at Why are these constructs using pre and post-increment undefined behavior?.Eric Postpischil suggested in a comment that we could do
#define get_next() (0, array[i++])
but I don't think that works either. Nowget_next() + get_next()
expands to(0, array[i++]) + (0, array[i++])
, and although there is a sequence point between each0
value computation and the correspondingi++
side effect, there still isn't any sequence point between the twoi++
side effects, nor between onei++
side effect and the oppositei
value computation.We could of course define
get_next()
to involve some call to an actual function (possibly inline) in which thei++
is performed, but as mentioned, I want to know if this can be avoided.We could also use language extensions like gcc's
({ ... })
but I want to stick to strict ISO C.
Are there any other options that I have missed?
As this is a language-lawyer question, I would appreciate if any proposed solution could be accompanied by a careful justification that its behavior is well-defined by the Standard.
This question is motivated by the observation that with the traditional macro definition of getchar()
as something like
#define getchar() (stdin->i == stdin->buflen ? __refill_buffer(stdin) : stdin->buf[stdin->i++])
then evaluating getchar() + getchar()
would cause UB unless the compiler took special steps to avoid it. (Note that the ?:
doesn't help with the sequencing between the two invocations, no more than (0, array[i++])
did.) I think that such an implementation is still conforming; although 7.1.4p3 guarantees a sequence point before a library function returns, this is vitiated by 7.1.4p1, which allows any library function to be defined as a macro, and footnote 190, which emphasizes that "Such macros might not contain the sequence points that the corresponding function calls do."
To avoid potential UB, the portable application programmer would have to #undef getchar
or write (getchar)() + (getchar)()
to suppress the macro definition and get the function. Perhaps more alarmingly, the same seems to be true of expressions like sqrt(2.0) + sqrt(3.0)
, and it makes me start to wonder if this is a defect.
But for now, I am curious if there is any "safe" way for a library implementer to modify the traditional macro implementation of getchar()
and company, such that getchar() + getchar()
can become well-defined, without having to extend the compiler itself beyond what ISO C promises.
Edit: Here's a candidate, though it's absolutely awful, so have your barf bags ready. A full declarator for a variably modified type is a full expression and thus is followed by a sequence point. So what about
#define get_next() (array[sizeof(char[i++])])
Does the sequence point follow the side effect as well as the value computation? gcc complains but clang doesn't.
inline
and directing the compiler to obey that directive. – Hardworkingstatic inline int get_next(void) { … }
? – Lexicostatistics({ ... })
extension which I mentioned. So I wondered if perhaps there was some clever way to get it with standard C features. – Haywoodsqrt(2.0) + sqrt(3.0)
? Because it seems to me that you should.) – Haywoodsqrt(2.0) + sqrt(3.0)
is OK. It is clear as a day what the author intended and there is no simpler way to express it. Why would I reject it? – Peytonsqrt
is implemented as a macro. If it'ssqrt(anything with side effects)
then it's an instant no, even if it's a full expression. PS I think I would probably ask to rewriteget_next() + get_next()
too. – Peytongetchar()
orget_next()
? In case ofgetchar()
specifically it could perhaps be implemented as some flavour ofscanf("%c")
and then there's a special loop hole: there is a sequence point "after the actions associated with each formatted input/output function conversion specifier". – Farwellsizeof(char[i++])
invalid wheni
is zero? There was some other question recently about VLA evaluation to zero in run-time. I wouldn't expect a compiler to do anything well-defined here: it may not need to produce a diagnostic, but it need not produce well-defined behavior either. – Farwellconst int array[100];
is at file scope, every element of the array will be zero for all time — so the result of any (valid) array access will be zero. I assume that the 'real life' scenario this is based on doesn't have this feature. If it did, you'd only have to worry about incrementingi
— and(i += 2, 0)
would do the job of two invocations ofget_next()
. I assume the 'real life' code also takes steps somewhere along the line to ensure no out-of-bounds access to the array. – Lexicostatisticsi == 0
, one could write#define get_next() (array[sizeof(char[++i]) - 1])
– Brigitte