What language standards allow ignoring null terminators on fixed size arrays?
Asked Answered
B

2

20

We are transitioning C code into C++.
I noticed that the following code is well defined in C,

int main(){

  //length is valid. '\0' is ignored
  char  str[3]="abc";
}

as it is stated in Array initialization that:

"If the size of the array is known, it may be one less than the size of the string literal, in which case the terminating null character is ignored."

However, if I were to build the same code in C++, I get the following C++ error:

error: initializer-string for array of chars is too long
[-fpermissive]    char  str[3]="abc";

I'm hoping someone can expound on this.

Questions:
Is the code example valid in all C language standards?
Is it invalid in all C++ language standards?
Is there a reason that is valid in one language but not another?

Barnhill answered 16/6, 2016 at 14:6 Comment(24)
You are quoting all the standards yourself. Yes, it is valid C. Yes, it is ill-formed C++. What is unclear? And no, nobody knows the reason for sure, but consensus is that C++ is more strict than C in general.Credential
Have you tried to compile against different C++ standards? I can answer the first question: Yes, this is valid in all versions of ISO 9899.Martinemartineau
C and C++ are different languages. So why do you expect them to behave identical? And just making C code run in C++ is a bad idea in general. Either keep using the C code as a seperate module or rewrite using C++ features/library properly. And the code shown is bad in C, too - unless you dont`t want a "string". In that case using an initialiser list is the better approach anyway.Osteoplastic
Oh, and there is only one language standard: C11 (similar for C++).Osteoplastic
@Olaf I suppose I meant dialect then. I wanted to know how this works across the C / C++ landscape. So it definitely doesn't work in C++. Ok.Barnhill
C++ made several potentially unsafe things invalid, including string literal to char* conversions (though that took a while), this, and void* to T* conversions.Puny
@TrevorHickey: Well, on XC8, there might be additional restrictions - There are no "dialects" in standard C.Osteoplastic
@Olaf GNU89 is a standard, too.Martinemartineau
Let's keep comments civil and avoid insults. I've cleaned things up here.Beet
@TrevorHickey No, your question is a very good one. It's just that some people (usually including me) have a problem with people trying to treat C and C++ as the same thing, but your question is actually interesting and useful.Martinemartineau
@TrevorHickey: Maybe it would have been better to do some research what actually goes on in C when initialising an array with a string literal. The code above is allowed in C for very dangerous and strongly deprecated legcy reasons. It should not occur in modern code in general (no rule without exception, but you should have extremely good reason to - >99,99% of the code has not).Osteoplastic
@FUZxxl: GNU89 never has been standard. You might confuse this with the German "Standard" which in English means "norm" (perfect false friends).Osteoplastic
@FUZxxl GNU89 is a standard, too. I'd say it's an implementation that predates formal standards.Tradein
@FUZxxl: (That was another vicim of the deletes) You missed the point of my comment. I did not mean the tags are wrong. It is just that there is no use in expecting two different languages to have the same behaviour. Identical syntax does not imply identical semantics.Osteoplastic
@AndrewHenle: Yes, gcc introduced some features which showed up in later versions of the standard. But not always with identical syntax (e.g. designated initialisers, inline, flexible array members vs. zero-length arrays). And many features never made it into the standard.Osteoplastic
@Olad GNU89 is a standard declared by the gcc project. It has found some adoption, notably by clang.Martinemartineau
@Olaf "So why do you expect them to behave identical?" Because C++ is designed with compatibility in mind, and the standard committees try to preserve it when possible and adequate. For example, the C standard library is part of the C++ standard; most implementations use the same library binaries and header sources for both languages. That that is possible is a design feature. Deviations from this rule of thumb are rare and often regretted by committee members after the fact.Indiscrete
@PeterA.Schneider: Perhaps the best approach would be to define a macro which, if included after a string literal, would indicate that no zero termination is assumed. C or C++ compilers could define that as macro as e.g. "\&" if they did not use backslash-ampersand as an escape for any other purpose (or substitute any other character they don't use as an escape); alternatively, C compilers could simply define the macro as nothing at the cost of needlessly wasting an extra byte in contexts where the string yields a pointer, or possible compiler squawks if used for array initialization.Crispation
@PeterA.Schneider: That is plain wrong. const qualifier has different semantics, VLAs do not exist in C++ neither do FAMs. And apparently such things like the above are also not compatible. Re the libraries: That would make Go or Python also C compatible, because they can use the same libraries. Sorry, but that is nonsense!Osteoplastic
Closely related to No compiler error when fixed size char array is initialized without enough room for null terminatorPoundage
You don't ask about this, but it seems worth pointing out that the standard idiom/workaround for this in C++ is simply: char str[] = { 'a', 'b', 'c' }; // look ma, no NULDozier
@Olaf you are aware that C's const is a feature "back-ported" from C++? This seems to support rather than disprove that the committees try to preserve compatibility where possible and adequate. With respect to the origin of some of the (slightly) different const semantics you may be interested in James Kanze's illuminating post at https://mcmap.net/q/272974/-what-is-the-difference-in-const-correctness-between-c-and-c. With respect to header and library use: I don't think you can #include stdio.h in Python. That go, otoh, can use them with cgo is also by design and thus rather an argument for C++'s compatibility.Indiscrete
@PeterA.Schneider: I am very well. But I'm also aware it has different semantics - despite the fact it is a backport. Try const int i = 10; static int a[i];! Funny enough, while I cannot #include` the header directly, it is no problem to call the functions from Python via ctypes. More direct for other languages. A header is not necessary to use a C function anyway, you can provide your own declarations. So what does that prove? Interestingly, you silently ignore VLAs and FAMs, introduced along with const C99.Osteoplastic
@Olaf That a language can compile non-trivial source code (e.g. header files) of another language is a proof for compatibility. (You asked what that proves: It proves what I said.) I never claimed complete upward compatibility.Indiscrete
F
16

What you see here is a difference in the initialization rules for cstring in C and C++. In C11 §6.7.9/14 we have

An array of character type may be initialized by a character string literal or UTF−8 string literal, optionally enclosed in braces. Successive bytes of the string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.

emphasis mine

So as long as the array is large enough for the string excluding the null terminator it is valid. So

char  str[3]="abc";

Is valid C. In C++14 however the rule that governs this found in [dcl.init.string]/2 states

There shall not be more initializers than there are array elements.

And goes on to show that the following code is an error

char cv[4] = "asdf"; // error

So in C++ you have to have enough storage for the entire string literal including the null terminator.

Forby answered 16/6, 2016 at 14:25 Comment(11)
It might be a good idea to state which standards you are citing.Martinemartineau
@FUZxxl Just did. I am quoting the current standards.Forby
@Forby You said nothing about why it is allowed in C and not allowed in C++.Cultivable
@VladfromMoscow I just reworded the C part. Does that make it clearer?Forby
It might be useful to have a look at the C++ design rationale, but I don't know where to find it.Martinemartineau
@FUZxxl Most likely it was for safety. Normally you want to have the null terminator to make it a proper cstring.Forby
I think C++ wants you to treat strings as strings, while C's always been more flexible.Greisen
@Forby If you initialize a character array, quite often you don't want the trailing NUL byte. For example, if I write a function to convert numbers to hexadecimal, I use a lookup table of the form char hextab[16] = "0123456789abcdef". Adding an extra NUL byte is wasted storage.Martinemartineau
Re "So as long as the array is large enough for the string excluding the null terminator it is valid": The standard mentions the null terminator only to make clear it's copied as well, if there is room for it. It does not imply that "as long as the array is large enough for the string excluding the null terminator it is valid". The array can be of arbitrary length; it will be initialized from the literal as far as either of them goes.Indiscrete
@PeterA.Schneider I'm not sure about that. Trying to use char foo[2] = "test"; gives a warning in gcc and clang. Using char foo[4] = "test"; issues no such diagnostic. Not sure if the compilers are being overzealous but by my reading it seams it is only well defined to drop the null terminator.Forby
A good way to think about this is simply that, in C++, the double quotes always imply a NUL terminator. So you are literally trying to initialize the array with more elements than it contains, which is a bug/error. If you don't want a NUL-terminated string, don't use double quotes. Use single quotes, which give you individual characters.Dozier
B
5

Is the code example valid in all C language standards?

Note that only one ISO standard is in effect at a time; C2011 supercedes C99, which superceded C89.

I believe it should be valid under any one of those standards, though.

Is it invalid in all C++ language standards?

Same as above, just change "valid" to "invalid".

Is there a reason that is valid in one language but not another?

Most likely, it was left valid in C so as not to break any legacy code that relied on the behavior. C++ came along about a decade or so after C and tried to address some of C's shortcomings, and this was one of the holes that got plugged.

Many modern programming languages are iterations and improvements on earlier languages; C is B with a type system, C++ is C with OO support and better type safety, Java and C# are C++ with less undefined behavior, etc.

Bant answered 16/6, 2016 at 15:32 Comment(3)
Java has almost nothing in common with C++. Curly braces are their most important common feature.Hindsight
If a structure contains room for an exact-size string, it's much more convenient to allow the element to be initialzed as "INIT" than to require it be written as {'I','N','I','T'}. Best yet would have been to have a syntax to explicitly indicate a string with no terminator, but neither C nor C++ seems to have that.Crispation
@n.m. Everything about their syntax is literally identical, but it does stop there.Fer

© 2022 - 2024 — McMap. All rights reserved.