Is it safe to "play" with parameter constness in extern "C" declarations?
Asked Answered
P

2

7

Suppose I'm using some C library which has a function:

int foo(char* str);

and I know for a fact that foo() does not modify the memory pointed to by str. It's just poorly written and doesn't bother to declare str being constant.

Now, in my C++ code, I currently have:

extern "C" int foo(char* str);

and I use it like so:

foo(const_cast<char*>("Hello world"));

My question: Is it safe - in principle, from a language-lawyering perspective, and in practice - for me to write:

extern "C" int foo(const char* str);

and skip the const_cast'ing?

If it is not safe, please explain why.

Note: I am specifically interested in the case of C++98 code (yes, woe is me), so if you're assuming a later version of the language standard, please say so.

Phyte answered 10/11, 2020 at 9:38 Comment(20)
'Doesn't answer your question, but a scalable solution would be to write a fooWrapper function that does the const_cast for you.Gratt
@Elliott: Indeed, but unfortunately, what I'm actually facing is a vararg function, and in order to wrap that I (sort of) need variadic templates, while I'm stuck in C++98 with this code.Phyte
Technically it's undefined behavior. Practically, I doubt will be able to find a compiler where it does not work. It would be not safe on a (insane) architecture that would use different registers to pass a const char* argument and different register to pass a char * argument to function.Shoal
This hack relies on C not having overloads and that the linker will be happy when it finds any foo I guess?Lakisha
@KamilCuk: What is the basis for your statement that it is undefined behavior? Is it the fact that it's ABI-dependent?Phyte
The fact that int foo(const char*); and int foo(char*) are not compatible function declarations.Shoal
@KamilCuk: But we're not talking about C++ functions, we're talking about C functions. Hopefully, once C code is compiled, there is no difference between those two functions w.r.t. interaction with other code.Phyte
I do not understand. No, in C these function are not compatible. If you would write that code in C, it would be undefined behavior, the same way as in C++.Shoal
Do you have va_list version of your vararg function? You could create wrapper for that without variadic templates.Miss
At least with C it is not a problem. const char *s = "123"; foo((char *) s); is OK when foo() does not write to s[]. Yet C++ and C are different.Obliquely
There is still the overload way to simulate variadic: template <typename T1> void bar(T1), template <typename T1, typename T2> void bar(T1, T2), ...Eloisaeloise
@cigien: Partly. I'll clarify.Phyte
@user694733: I probably don't.Phyte
@chux-ReinstateMonica: By "it" do you mean doing what I've done? If so, please make your comment into an answer and explain why this is safe (not why const-casting is safe).Phyte
By "it", if cast was done in C, then no issues. Yet what you did is C++ and this post is primarily asking about C++. So at best my answer would only answer the C half of this C/C++ question.Obliquely
@TedLyngmo it's declared extern "C" so no need to worry about overloads...Balmoral
@Balmoral I know, so if someone changes the original function to void foo(char* str, size_t len); the declaration extern "C" int foo(const char* str); will still make it possible to link it (by the linkers I know of).Lakisha
@TedLyngmo I'm fairly sure that's undefined behaviour, but yes.Balmoral
@Balmoral I'm 100% sure it'll be UB if that function is ever called :-)Lakisha
@Phyte I'd be sure even if the signature wasn't wrong. There's nothing in what we're seen and the consensus most of us agree upon that makes up a defined behavior. This is not mentioned in C++. My conclusion that it therefore must be undefined.Lakisha
S
2

Is it safe for me to write: and skip the const_cast'ing?

No.

If it is not safe, please explain why.

-- From language side:

After reading the dcl.link I think exactly how the interoperability works between C and C++ is not exactly specified, with many "no diagnostic required" cases. The most important part is:

Two declarations for a function with C language linkage with the same function name (ignoring the namespace names that qualify it) that appear in different namespace scopes refer to the same function.

Because they refer to the same function, I believe a sane assumption would be that the declaration of a identifier with C language linkage on C++ side has to be compatible with the declaration of that symbol on C side. In C++ there is no concept of "compatible types", in C++ two declarations have to be identical (after transformations), making the restriction actually more strict.

From C++ side, we read c++draft basic#link-11:

After all adjustments of types (during which typedefs are replaced by their definitions), the types specified by all declarations referring to a given variable or function shall be identical, [...]

Because the declaration int foo(const char *str) with C language linkage in a C++ translation unit is not identical to the declaration int foo(char *str) declared in C translation unit (thus it has C language linkage), the behavior is undefined (with famous "no diagnostic required").

From C side (I think this is not even needed - the C++ side is enough to make the program have undefined behavior. anyway), the most important part would be C99 6.7.5.3p15:

For two function types to be compatible, both shall specify compatible return types. Moreover, the parameter type lists, if both are present, shall agree in the number of parameters and in use of the ellipsis terminator; corresponding parameters shall have compatible types [...]

Because from C99 6.7.5.1p2:

For two pointer types to be compatible, both shall be identically qualified and both shall be pointers to compatible types.

and C99 6.7.3p9:

For two qualified types to be compatible, both shall have the identically qualified version of a compatible type [...]

So because char is not compatible with const char, thus const char * is not compatible with char *, thus int foo(const char *) is not compatible with int foo(char*). Calling such a function (C99 6.5.2.2p9) would be undefined behavior (you may see also C99 J.2)

-- From practical side:

I do not believe will be able to find a compiler+architecture combination where one translation unit sees int foo(const char *) and the other translation unit defines a function int foo(char *) { /* some stuff */ } and it would "not work".

Theoretically, an insane implementation may use a different register to pass a const char* argument and a different one to pass a char* argument, which I hope would be well documented in that insane architecture ABI and compiler. If that's so, wrong registers will be used for parameters, it will "not work".

Still, using a simple wrapper costs nothing:

static inline int foo2(const char *var) {
    return foo(static_cast<char*>(var));
}
Shoal answered 10/11, 2020 at 10:53 Comment(7)
"Because they refer to the same function, ... the declaration ... has to be compatible with the declaration of that symbol on C side." There is no such thing as a "declaration on the C side". There is just a binary object and a symbol in the symbol table. And the symbol is the same regardless of the parameter constness.Phyte
"the symbol int foo(const char *str) with C language linkage in a C++ translation unit is not identical to symbol int foo(char *str) declared in C translation unit" <- symbols are not signatures; the symbols are, AFAICT, identical (although perhaps this is not guaranteed by the C language standard?); so we do have compatibility. Supposedly.Phyte
s/symbol/declarations/. The declarations are not identical, that's what I meant.Shoal
There is no such thing as a "declaration on the C side" But.. there is? You started your text with I'm using some C library which has a function:, there has to be a declaration int foo(char*) on C side (note that definition is a declaration). I tried to search how linking is required to work between C and C++ and what specific requirements are there, but couldn't find it (exactly) in C++ standard. So I wrote about my "sane assumption...". The assumption here is that a declaration in C source is (semantically) equal to a declaration in a C++ source with C language linkage.Shoal
I believe one may argue that this assumption is not exactly specified in the standard, to what I will say that if something is not specified, it's behavior is undefined - so it's all implementation specific anyway. There is just a binary object and a symbol in the symbol table is assuming there exists a thing like "binary object" and "symbol table", which are some architecture specific stuff, unrelated to C++ as a language. I tried to approach from architecture-"agnostic" point of view.Shoal
The C++ compiler doesn't know that there's any C code anywhere. In fact, it doesn't even know that there's compiled code anywhere. Also, I could have changed my question to "a library implemented in assembly but compatible with C code". Also, yes, symbols are architecture-specific, but I commented on them because you said "symbols" rather than "declarations".Phyte
I could have changed my question to "a library implemented in assembly but compatible with C code" Sure, and in this case, my conclusion would be that it's not defined by the C++ standard how should the C++ (as a language) interoperate with such library, because it's not in the scope of C++ language (it's undefined behavior, because.. it's not defined). C++ doesn't know what is a "library implemented in assembly", C++ doesn't know what is "a library" and what is "assembly". It's all specific to that specific environment that has that "library", outside of the language scope.Shoal
S
1

I think the base answer is:

Yes, you can cast off const even if the referenced object is itself const such as a string literal in the example. Undefined behaviour is only specified to arise in the event of an attempt to modify the const object not as a result of the cast. Those rules and their reason to exist is 'old'. I'm sure they predate C++98.

Contrast it with volatile where any attempt to access a volatile object through a non-volatile reference is undefined behaviour. I can only read 'access' as read and/or write here.

I won't repeat the other suggestions but here is the most paranoid solution. It's paranoid not because the C++ semantics aren't clear. They are clear. At least if you accept something being undefined behaviour is clear!

But you've described it as 'poorly written' and you want to put some sandbags round it!

The paranoid solution relies on the fact that if you are passing a constant object it will be constant for the whole execution (if the program doesn't risk UB).

So make a single copy of "hello world" lower in the call-stack or even initialised as a file scope object. You can declare it static in a function and it will (with minimal overhead) only be constructed once.

This recovers almost all of the benefits of string literal. The lower down the call stack including file-scope (global you put it the better. I don't know how long the lifetime of the pointed-to object passed to foo() needs to be. So it needs to be at least low enough in the chain to satisfy that condition. NB: C++98 has std::string but it won't quite do here because you're still forbidden for modifying the result of c_str(). Here the semantics are defined.

#include <cstring>
#include <iostream>

class pseudo_const{
public:
    pseudo_const(const char*const cstr): str(NULL){
        const size_t sz=strlen(cstr)+1;
        str=new char[sz];
        memcpy(str,cstr,sz);
    }
    

    //Returns a pointer to a life-time permanent copy of 
    //the string passed to the constructor.
    //Modifying the string through this value will be reflected in all
    // subsequent calls.  
    char* get_constlike() const {
        return str;
    }
    
    ~pseudo_const(){
        delete [] str;
    }
private:
    char* str;

};

const pseudo_const str("hello world");

int main() {
    std::cout << str.get_constlike() << std::endl;
    return 0;
}
Subhead answered 10/11, 2020 at 11:44 Comment(2)
We're talking about a function which: 1. Is not visible to the C++ compiler 2.was not written in C++ 3. Doesn't change the pointed-to memory. Also, in C++ (or with relevant ABIs), function signatures are mangled into the symbol; in C they arent' - the name is the symbol.Phyte
@Phyte I still a tiny bit suspect of something in C that isn't declared const. C has had const for over 30 years officially! I'll modify my answer. There are still better options. The linker may (and probably does) incorporate type information in C. I talked about identifier rather than name to avoid any assumption that overloading is achieved through 'name decoration' (AKA: name mangling). Though I'm not aware of a linker that uses a more structured approach. One day we won't need Stroustrup's hack!Subhead

© 2022 - 2024 — McMap. All rights reserved.