Is it allowed to name a global variable `read` or `malloc` in C++?
Asked Answered
L

2

38

Consider the following C++17 code:

#include <iostream>
int read;
int main(){
    std::ios_base::sync_with_stdio(false);
    std::cin >> read;
}

It compiles and runs fine on Godbolt with GCC 11.2 and Clang 12.0.1, but results in runtime error if compiled with a -static key.

As far as I understand, there is a POSIX(?) function called read (see man read(2)), so the example above actually invokes ODR violation and the program is essentially ill-formed even when compiled without -static. GCC even emits warning if I try to name a variable malloc: built-in function 'malloc' declared as non-function

Is the program above valid C++17? If no, why? If yes, is it a compiler bug which prevents it from running?

Lilas answered 3/10, 2021 at 11:20 Comment(20)
@Someprogrammerdude this is where the original question comes from, actually. Great source of random C++ riddles.Lilas
@Someprogrammerdude also, whether sync_with_stdio(0) is needed or not depends a lot on the contest. For example, lots of local Russian ICPC contests have very tight time limits and you'd better not use <iostream> at all because it's slow. Obviously, it's a combination of large I/O and specific compilers/default compilation flags used on such competitions.Lilas
Unfortunately, a great source of misconceptions and utterly bad code examples, too. The harm such websites are doing to C++ beginners is immense. This has nothing to do with professional programming.Skald
@Skald Yes, kinda. However, in this case this line is required for the problem to occur. It's not illegal in itself, hence the question.Lilas
(That was not to criticize this question in any way.)Skald
@BarmakShemirani fread is in the standard, so I would be less surprised, I was thinking about read(2), will clarify.Lilas
This are the rules for naming identifiers in c++ en.cppreference.com/w/cpp/language/identifiers). So yes you can name variables "read", "fread", "min", "max". However any nameclashes that occur as a result of using libraries are yours to solve. That's why namespaces are recommended : en.cppreference.com/w/cpp/language/namespaceDiscolor
Yes, it is allowed. Neither read nor malloc are reserved identifiers according to the standard. Of course, in some contexts (e.g. where <stdlib.h> is included, which declares malloc()) having a variable named malloc would be problematical, just as it would be problematical in some contexts having both a user-declared function named foo() and a variable named foo.Carabiniere
I see. If you add #include <unistd.h> do you get a compiler error for redefinition of read?Smedley
@BarmakShemirani Yes, I do get an error in that case. However, one doesn't typically include all possible headers in a program.Lilas
There can be an argument made putting all of your program into a namespace to avoid clashes with external libraries.Sonneteer
The global namespace is the wild, wild west. The program above is valid C++17 if it has no ODR violations. Does not matter if the ODR is due to the code you supplied or due to the code the platform supplied.Barnaul
@Carabiniere Yes, it is allowed Not per POSIX, and this question is tagged posix (at the time I write this...). POSIX reserves some identifiers regardless of header inclusion in 2.2.2 The Name Space: "... (skip a lot) ... The following identifiers are reserved regardless of the inclusion of headers: ... malloc ..." Interestingly, read is not on that list.Bambibambie
@AndrewHenle: malloc is a function from C standard library which is included in C++ standard library. And in C++ an include file is allowed to load symbols from other include files so a cautious programmer should never use a symbol defined anywhere in the standard library. But according to the standards read is not one of them...Bertha
@AndrewHenle - The question is also tagged C++ (still is, as I write this), and that was the context of my comment. The C++ standard does not require an implementation to comply with, or enforce compliance with, any of the POSIX standards/specifications. And, of course, POSIX compliance is a property of operating systems related to compatibility with unix or "unix-like" systems, rather than a property of toolchains (which aim for compliance with relevant language standards/specifications) or user software (which may or may not assume a unix-compatible host).Carabiniere
For what it's worth, Visual C crashes only when you declare read as extern "C", as I would expect (and link statically).Viscoid
@Barnaul The "no matter where a conflicting definition comes from" is correct, and I was never aware of the implications. It is pretty scary: How on Earth am I to know what names libraries define that I never explicitly link to!?Viscoid
@Peter-ReinstateMonica that is why I have started putting my programs into their own unique namespace and only having main (or other entry point) in the global namespace. There is nothing I can do wrt to 3rd-party libraries conflicting with each other.Sonneteer
@RichardCritten Seems so ... extravagant. Also scary: It's a runtime error; in a less-used code path it may stay undetected for a while.Viscoid
@Peter-ReinstateMonica The comment from Eljay above expressed it most clearly. Not writing code in the global namespace should be added to the many "best-practice" guides, lint etc that are out there,Sonneteer
B
17

The code shown is valid (all C++ Standard versions, I believe). The similar restrictions are all listed in [reserved.names]. Since read is not declared in the C++ standard library, nor in the C standard library, nor in older versions of the standard libraries, and is not otherwise listed there, it's fair game as a name in the global namespace.

So is it an implementation defect that it won't link with -static? (Not a "compiler bug" - the compiler piece of the toolchain is fine, and there's nothing forbidding a warning on valid code.) It does at least work with default settings (though because of how the GNU linker doesn't mind duplicated symbols in an unused object of a dynamic library), and one could argue that's all that's needed for Standard compliance.

We also have at [intro.compliance]/8

A conforming implementation may have extensions (including additional library functions), provided they do not alter the behavior of any well-formed program. Implementations are required to diagnose programs that use such extensions that are ill-formed according to this International Standard. Having done so, however, they can compile and execute such programs.

We can consider POSIX functions such an extension. This is intentionally vague on when or how such extensions are enabled. The g++ driver of the GCC toolset links a number of libraries by default, and we can consider that as adding not only the availability of non-standard #include headers but also adding additional translation units to the program. In theory, different arguments to the g++ driver might make it work without the underlying link step using libc.so. But good luck - one could argue it's a problem that there's no simple way to link only names from the C++ and C standard libraries without including other unreserved names.

(Does not altering a well-formed program even mean that an implementation extension can't use non-reserved names for the additional libraries? I hope not, but I could see a strict reading implying that.)

So I haven't claimed a definitive answer to the question, but the practical situation is unlikely to change, and a Standard Defect Report would in my opinion be more nit-picking than a useful clarification.

Balthazar answered 3/10, 2021 at 12:9 Comment(5)
GLIBC manual says: "The names of all library types, macros, variables and functions that come from the ISO C standard are reserved unconditionally; your program may not redefine these names."Aq
Also, reserved.names-3 seems to say the same.Aq
@Ruslan: True, but note that read is not a C standard function (unlike, for example, fread - or malloc, come to that).Eparch
@Eparch indeed, didn't think of it.Aq
Related: If you want to use your own function called malloc, you also need the GCC option -fno-builtin-malloc to remove the implicit definition of it as an alias for the __builtin_malloc. (That's the mechanism by which GCC is able to inline memcpy, or for malloc to know that the returned pointer doesn't alias anything, and is aligned: What improvements does GCC's `__builtin_malloc()` provide over plain `malloc()`?.) @Ruslan. This is normally relevant in kernels, which don't link glibc, and some use -fno-builtin to disable everything.Suiter
G
6

Here is some explanation on why it produces a runtime error with -static only.

The https://godbolt.org/z/asKsv95G5 link in the question indicates that the runtime error with -static is Program returned: 139. The output of kill -l in Bash on Linux contains 11) SIGSEGV (and 128 + 11 = 139), so the process exits with fatal signal SIGSEGV (Segmentation fault) indicating invalid memory reference. The reason for that is that the process tries to run the contents (4 bytes) of the read variable as machine code. (Eventually std::cin >> ... calls read.) Either somethings fails in those 4 bytes accidentally interpreted as machine code, or it fails because the memory page containing those 4 bytes is not executable.

The reason why it succeeds without -static is that with dynamic linking it's possible to have multiple symbols with the same name (read): one in the program executable, and another one in the shared library (libc.so.6). std::cin >> ... (in libstdc++.so.6) links against libc.so.6, so when the dynamic linker tries to find the symbol read at program load time (to be used by libstdc++.so.6), it will look at libc.so.6 first, finding read there, and ignoring the read symbol in the program executable.

Grieve answered 4/10, 2021 at 9:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.