Detect use-after-move for global variables
Asked Answered
O

1

8

After some effort, I convinced both the clang compiler and clang-tidy (static analyzer) to warn of a use-after-move situation. (see https://mcmap.net/q/1473991/-detect-use-after-move-during-compilation)

int main(int, char**) {
    a_class a;
    auto b = std::move(a);
    a.f();  // warns here, for example "invalid invocation of method 'f' on object 'a' while it is in the 'consumed' state [-Werror,-Wconsumed]"
}

However, if I make the variable global (or static or lazily static), there is no more warning.

a_class a;

int main(int, char**) {
    auto b = std::move(a);
    a.f();  // no warns here!
}

See here: https://godbolt.org/z/3zW61qYfY

Is it possible to generalize some sort of use-after-move detection at compile-time for global variables? Or is it impossible, even in principle?


Note: please don't make this discussion about global object (I know it is a bad idea) or about the legality of using moved objects (I know some class are designed for that to be ok). The question is technical, about the compiler and tools to detect a certain bug-prone pattern in the program.


Full working code, compile with clang ... -Wconsumed -Werror -std=c++11 or use clang-tidy. The clang annotation (extensions) help the compiler detect the patterns.

#include<cassert>
#include<memory>

class [[clang::consumable(unconsumed)]] a_class {
    std::unique_ptr<int> p_;

public:
    [[clang::callable_when(unconsumed)]]
    void f() {}

    // private: [[clang::set_typestate(consumed)]] void invalidate() {}  // not needed but good to know
};

a_class a;

int main(int, char**) {
    // a_class a;
    auto b = std::move(a);
    a.f();  // global doesn't warn here
}

Most of the information I could find about this clang extension is from here: Andrea Kling's blog https://awesomekling.github.io/Catching-use-after-move-bugs-with-Clang-consumed-annotations/

Occlusive answered 31/10, 2022 at 16:10 Comment(33)
There is absolutely nothing wrong, whatsoever, with using a moved-from object. The C++ library guarantees that moved-from objects are left in a "valid but unspecified state", with emphasis on "valid". What's valid and well-formed for objects that are not moved, is also valid for a moved-from object. I would consider such a diagnostic, in absence of real undefined behavior, to be more of a hindrance than help. Please show a minimal reproducible example, that everyone else in the world can cut/paste, compile and run, that you claim produces this diagnostic message? Everything is speculation until then.Disgraceful
In simple cases (like above) it's is possible to detect. However with multiple compilation units, passing variables into external functions etc it can't be done. Given that programs with more than a single cpp files are the norm, it's probably not worth the time of the compiler developers.Perm
I'd recommend either make the global const or remove/hide the move constructor from the client codeFulgurous
@SamVarshavchik: That sounds wrong. Consider a std::unique_ptr and ptr->Foo(), which is obviously wrong (in the UB sense) after a move. I am not sure what you intended, but I doubt it was this.Wellbeing
It is wrong because of a lack of a nullptr check, @MSalters, not because of use after move. It would be equally wrong to do this for a not moved-from unique_ptr which is also a nullptr.Disgraceful
@Wellbeing It's not that a std:: moved from object still contains what's been moved. It's guaranteed to be in its empty state, ready to be used as is. It means you can use a std::string as an empty string directly after a move - no clear() needed.Capers
Global variable which changes state is invitation for bugs and entangled code.Chavey
Unfortunately, @TedLyngmo, a clear() is needed because you have no guarantees, whatsoever, that a moved-from std::string will be empty. If a short-string optimization is implemented, the moved-from string may be completely unchanged.Disgraceful
@SamVarshavchik Darn ... I was told that the standard library classes all gave this extra guarantee.Capers
The guarantee from the C++ library is "valid, but unspecified state". Every one of these four words have a very specific, narrow, strict meaning. Here, "unspecified" means exactly that. A practical joker can design a C++ library whose moved-from std::strings hold the complete contents of "Lord Of The Rings" (in static storage, with some kind of a "copy on write" flag set). This would be completely compliant with the C++ standard.Disgraceful
@SamVarshavchik: True, unless a specific type gives a stronger guarantee (e.g. unique_ptr does). I think you missed an Universal quantification somewhere: what's valid for all objects that aren't moved is also valid for every object that's moved-from. This explains ptr->Foo(), which is invalid when ptr is null.Wellbeing
@SamVarshavchik , if the godbolt link doesn't count as MWE I added the code to the question.Occlusive
Yes, just move to rust :))Containment
@MarekR , I am absolutely aware of " Global variable which changes state is invitation for bugs and entangled code.". This is actually to wrap a library with global state and make it less error-prone.Occlusive
@Occlusive From playing around with -Wconsumed, it seems that the implementation is quite incomplete (as also noted in the documentation). Clang is unable to diagnose the issue if even a single indirection is involved. So, a workaround might be to always use a non-reference local variable by means of a proxy and use a global getter rather than a variable, e.g. like this. Would this be viable in your use case?Neuberger
@Sedenion, nice approach, but I guess the problem persist because I can still make another proxy object by calling ` proxy A = get_a_class();` later in main. I tried modifying the code but there is still not detection that the actual global object is moved (i.e. consumed) at the start. Here it is my (failed attempt) godbolt.org/z/bcnoeKe7rOcclusive
@Occlusive Yes, but attempting to detect this becomes impossible at the latest when you access the global from more than one function. I.e. in this code, a.f() is only invalid if test1() was called before test2(). In general, there is no way for the compiler to know this. So I'd argue that the best you can do in C++ is to warn about use-after-move only within individual functions, but never across functions. So, in the proxy approach, you should call get_a_class() only once per function.Neuberger
@Sedenion, I did some experiments earlier and the clang feature was able to see consumed objects across functions. (e.g. moved inside a function, used later after the function). I know it could get eventually confused, but it works in principle across functions. So, using inside a single function is not the problem. The only limitation that I found is that the object cannot be global for it to work (either static inside a function or just a true global variable.) I think you have good arguments not to abuse or rely on this feature (yet).Occlusive
It also can't be a reference in certain cases, making any workarounds I can think of invalid.Neuberger
@Sedenion, also notice that the feature works through indirection godbolt.org/z/nqnd4eTa1 , at least partially, usage through references seems to trip it though. So yep, globals and references seem to be a limitation. Usually I don't fiddle with experimental features but I feel this one is a game changer for C++ and perhaps its only lifeline to survive Rust.Occlusive
Note also that the guarantee "valid, but unspecified state" is for standard library types.Florencia
yes, that also, but it would be useful to mark variables with arbitrary types as consumed, including standard library types, not just types marked use clang::consumable. I think this eventually will come to standard C++.Occlusive
@sSdenion To me it seems that if a function leaves a global variable in an unspecified state that is likely a bug in the program. So, in addition to detecting use after move inside a function I would be satisfied with diagnostics for move-ing the global variable and not setting the state afterwards in that function.Stereotomy
@HansOlsson, exactly, that is why I thought it could be a good idea to detect it by the compiler.Occlusive
@Occlusive So, considering all the comments above, I guess the answer to your question is unfortunately that it is simply not implemented. The most recent commit to LLVM that actually improved the warning is apparently from 2015, i.e. 7 years old. So I guess there is no broader interest in improving things any time soon.Neuberger
The code comments state that the analysis for checking consumed properties is intra-procedural. You may be able to use the clang static analyzer for CTU (cross translation unit) analysis (why I'm writing this as a comment and not an answer). Unfortunately, I can't find a flag such as -lwhole-program, which might inform clang to allow info about globals to be found. It seems like the ball gets dropped in ConsumedStmtVisitor::checkCallability(...) where VarState of CS_None dodges the warning call as well as a matching state. My guess is globals have CS_None as their state at that point.Taler
Also, it's important to note that in your godbolt example, if you declare 'a' locally and then remove the 'auto b' line (such that you declare a and then attempt to call a.f()), you'll still have an 'invalid invocation' error, because you need a constructor for a_class that uses return_typestate(unconsumed)Taler
@MarkH thank you for your observations. i am ready to accept your technical llvm explanation as an answer.Occlusive
@MarkH, I think the annotated default constructor is not necessary because how the class was declared as [[clang::consumable( unconsumed )]]. But I could be wrong.Occlusive
Two interesting things: 1) This comment that it's unmaintained, so probably unfit for use: github.com/llvm/llvm-project/issues/… 2) I tried keeping a static class unordered_set of pointers to instances, but it appears the consumed state logic couldn't follow calling methods that way like it can a direct a.invalidate() call.Taler
clang also seems to behave similarly with respect to uninitialized global variables, ex: godbolt.org/z/f4aKo3r3j . I believe the uninitialized analysis is fed in the same way as the consumed analysis. Can't find the fundamental reason why globals are excluded, though. Looking at the AST, I can see they are in the same TranslationUnitDecl, but global variables are declared in VarDecl's outside of any FunctionDecl.Taler
@MarkH, thanks for your comments. Your investigation is the closest to an answer here. i.e. the analysis is in principle possible but not currently implemented for globals in clang for some reason (maybe it is hard to implement to the point that is mildly useful).Occlusive
@MarkH, I think a global variable is always initialized (e.g. to zero) because its value is contained in the binary. That is why there no warning for the global x in your example. I think that is how compilers work if I am not mistaken,Occlusive
H
0

You could use cppcheck with the performance and style checks. With cppcheck --enable=performance,style file.cpp you could check your file.

For example in the following code (simple use-after-move):

#include <utility>

// Declare a global variable
int globalVariable = 0;

int main() {
  // Move the global variable to a local variable
  int localVariable = std::move(globalVariable);

  // Access the global variable after it has been moved
  return globalVariable;
}

it would yield [my_file.cpp:8]: (performance) Access to moved-from object 'globalVariable'. and therefore detecting the use-after-move case! :)

Heptateuch answered 5/12, 2022 at 18:1 Comment(1)
is this cppcheck 2.7?Occlusive

© 2022 - 2024 — McMap. All rights reserved.