"*s = 0" being optimized out. Possible GCC 13 bug? Or some undefined behaviour?
Asked Answered
C

1

9

With GCC 13.2, the output of the following code depends on the optimization level:

#include <ctype.h>
#include <stdio.h>

char *SkipAName(char *s) {
  if (('A' <= *s && *s <= 'Z') || ('a' <= *s && *s <= 'z') || *s == '_' ||
      *s == '$') {
    if (*s == '$') {
      s++;
    }
    while (isalnum(*s)) {
      s++;
    }
    if (*s == '_') {
      s++;
    }
  }
  return s;
}

int TestName(char *name) {
  while (*name) {
    name++;
  }
  return 0;
}

int StrICmp(char *s1, char *s2) {
  while (*s1 && tolower(*s1) == tolower(*s2)) {
    s1++;
    s2++;
  }
  return tolower(*s1) - tolower(*s2);
}

int DoTable(char *s) {
  char *name, c;

  do {
    name = s;
    s = SkipAName(s);
    c = *s;
    *s = 0;
    TestName(name);
    *s = c;
    if (*s == '(') {
      break;
    }
    if (*s != ',') {
      printf("Error 1\n");
      return 1;
    }
    *s = 0;

    if (StrICmp(name, "sparse") == 0) {
    } else {
      printf("Error 2\n");
      return 1;
    }
    *s++ = ',';
    while (*s == ',') {
      s++;
    }
  } while (*s);

  printf("OK\n");
  return 0;
}

int main() {
  char buf[] = "sparse,C(1)";
  DoTable(buf);
  return 0;
}
$ gcc-13 -O0 test.c && ./a.out
OK
$ gcc-13 -O1 test.c && ./a.out
OK
$ gcc-13 -O2 test.c && ./a.out
Error 2
$ gcc-13 -O3 test.c && ./a.out
Error 2

The code comes from this project and I have tried to make a minimal reproducible example; that is why this code may look awkward. See also this issue.

I am wondering if this is a bug in GCC 13 or if I have hit some undefined behaviour. With Compiler Explorer, it looks like somehow *s = 0 at line 52 is optimized away. The code works fine with GCC 12.3.

Coextensive answered 2/11, 2023 at 5:37 Comment(9)
I can't see how this can be anything but a bug. That *s = 0; line is important and has a clearly visible effect. Thus, it should not be optimized out. I can only guess that the compiler is getting 'confused' by the earlier occurrence of the same C code.Christology
Note also that commenting out the 4 lines immediately following s = SkipAName(s); (which, in your reduced code, have no nett effect) seems to fix the compiler's output.Christology
TestName isn't really doing anything. But on my system using -O2, the output change if I comment out the call of TestNamePomatum
Some lines in the above example should have no effect, but eliminating them changes the output, so I was not able to remove these lines from the original code (github.com/vermaseren/form/blob/…).Coextensive
@SupportUkraine I think while (*s1 && tolower(*s1) == tolower(*s2)) { ... should be OK for null-terminated strings s1 and s2. For example, s1 = "abc" and s2 = "a" lead to the return value 'b' - 0.Coextensive
@Support Ukraine Re "isn't good if s1 is longer than s2", No, it's fine. You'd end up comparing something nonzero to tolower(0) which is 0 itself. As such, the == will return 0, exiting the loop.Cater
It doesn't need to execute *s = 0 if it can conclude that this memory location will never be used anyway. Which the compiler may be able to predict by inlining StrICmp or similar optimization. I can't be bothered to reverse-engineer all of these optimizations... That being said, gcc 13 is a complete bug fest and very far from a stable release. It shouldn't be used.Klaraklarika
It is similar to gcc.gnu.org/bugzilla/show_bug.cgi?id=111519, which will be fixed in GCC 13.3Curagh
@Curagh Indeed gcc(trunk) on Godbolt adds a mov BYTE PTR [rbp+0], 0 which wasn't there in 13.2. gcc 13.2 has taken "the weekly gcc bug" to "the daily gcc bug". I remember a time when we only had "the yearly gcc bug"... Everyone should seriously consider backing down to gcc 7 or so which is afaik the last stable version with (C11) standard compliance.Klaraklarika
V
6

I am wondering if this is a bug in GCC 13 or if I have hit some undefined behaviour.

  • The arguments to isalnum() and tolower() should be chars that have been converted to type unsigned char (before automatically being converted to int to match the parameter type). Example: isalnum((unsigned char) *s). That issue could cause a similar program to manifest UB, but the input provided by the particular program in your example will not have that effect.

  • The code assumes that the uppercase Latin letters are encoded as a contiguous range of numbers in the execution character set, and the same for the lowercase Latin letters. C does not guarantee that this will be so, and if it isn't, then the program will not work as intended. However, that issue is not at all likely to be a problem for you in your test environment.

  • SkipAName() also seems to have weird rules for what it considers a name to skip, but that's not erroneous.

  • TestName() has no caller-observable effect. But that's not erroneous, and if the observed misbehavior depends on that function and the call to it being present in the program then that's very curious.

  • StrICmp() can run off the end of the second string, thus producing UB, but that will not happen with your particular input.

  • It's a bit nasty that DoTable() modifies its input, and especially that it does not consistently reverse its changes, since they seem intended only to serve its internal purposes. Among other things, that makes it unsafe to use with string literals. But your example input is not a string literal, and it is not inherently erroneous that DoTable() modifies it.

  • I do not like DoTable()'s use of parameter s. I am not altogether against functions modifying their parameters, but I am very much for using descriptive names. It's altogether unclear what the name "s" means that is both adequately descriptive and consistent with all the uses to which it is put. But of course, that doesn't make the code erroneous.

Overall, this code has some issues, and I dislike it overall, but I see no undefined behavior in it as presented. The outputs you show for GCC 13 builds at higher optimization levels are wrong. That is, they reflect a bug in GCC 13.

Vibrant answered 2/11, 2023 at 14:32 Comment(4)
"The outputs you show for GCC 13 builds at higher optimization levels are wrong. That is, they reflect a bug in GCC 13." -- but have you verified his claims?Sapienza
What claims, @Blindy? Do you mean that one of the assignments to *s is not performed by the optimized executable? I think the OP has adequately demonstrated that already. And it is anyway not central to the question, which is fundamentally "Does this program have undefined behavior?". The point of that, of course, is that UB would be the main plausible justification for GCC builds of the program to exhibit behavior differences at different optimization levels, as the OP observes.Vibrant
Or do you mean that the compiled program indeed displays the described variation in behavior? Yes, I have confirmed (at GodBolt) that the program exhibits the described optimization-level-dependent difference in output with GCC 13.2.Vibrant
According to gcc.gnu.org/bugzilla/show_bug.cgi?id=112346, this is a GCC bug and will be fixed in 13.3.0.Coextensive

© 2022 - 2025 — McMap. All rights reserved.