Is there a way to get the C++ compiler to detect and warn/error about this object-lifetime error?
Asked Answered
B

1

5

Consider the toy example code below. It has a too-large-to-copy data structure MyDataStructure and a little iterator class MyDataIterator that the caller can instantiate to iterate over the data in the structure.

In the toy version of the code, the data structure contains just 10 integers (1000-1009) and the WORKING CODE section in main() iterates over them correctly, as expected.

However, the second (BROKEN CODE) section in main() doesn't iterate over them correctly; it invokes undefined behavior because it calls GetDataStructureByValue() instead of GetDataStructureByReference(), which means that the MyDataStructure reference that iter stores in its _ds private member-variable becomes a dangling reference before the first loop-iteration.

This is a bit of an insidious problem, because in non-toy code it's not always easy to tell/remember from the name of a function whether it is returning by-value or by-reference, but the former case results in a non-obvious runtime bug, while the latter case works fine.

My question is, is there any reasonable way to get a modern C++ compiler to error-out or at least warn about the problematic case? I suspect there isn't, but I'm asking anyway because I'd feel more secure if instances of this problem could be automatically flagged for me somehow.

The source code of the toy program is below, followed by an example of its output on my Mac running XCode:

#include <stdio.h>

class MyDataStructure
{
public:
   MyDataStructure()
   {
      printf("MyDataStructure CTOR %p\n", this);
      for (int i=0; i<10; i++) _data[i] = i+1000;
   }

   ~MyDataStructure()
   {
      printf("MyDataStructure DTOR %p\n", this);
      for (int i=0; i<10; i++) _data[i] = -1;  // just to make the symptoms more obvious
   }

   bool IsPositionValid(int pos) const {return ((pos >= 0)&&(pos < 10));}

   int GetValueAt(int pos) const {return _data[pos];}

private:
   int _data[10];
};

const MyDataStructure & GetDataStructureByReference()
{
   static MyDataStructure _ds;
   return _ds;
}

MyDataStructure GetDataStructureByValue()
{
   MyDataStructure _ds;
   return _ds;
}

class MyDataIterator
{
public:
   MyDataIterator(const MyDataStructure & ds) : _ds(ds), _idx(0) {/* empty */}

   bool HasValue() const {return _ds.IsPositionValid(_idx);}
   int  GetValue() const {return _ds.GetValueAt(_idx);}

   void operator++(int) {_idx++;}

   const MyDataStructure & GetDataStructure() const {return _ds;}

private:
   const MyDataStructure & _ds;
   int _idx;
};

int main(int, char **)
{
   {
      // The following line of code is okay; it counts from 1000 to 10009, as expected
      printf("WORKING CODE:\n");
      for (MyDataIterator iter(GetDataStructureByReference()); iter.HasValue(); iter++) printf("iter returned %i from %p\n", iter.GetValue(), &iter.GetDataStructure());
   }

   {
      // The following line of code is buggy because, the MyDataStructure object that (iter) keeps a reference to
      // gets destroyed before the first iteration of the loop.
      printf("\nBROKEN CODE:\n");
      for (MyDataIterator iter(GetDataStructureByValue()); iter.HasValue(); iter++) printf("iter returned %i from %p\n", iter.GetValue(), &iter.GetDataStructure());
   }

   printf("\nEND OF PROGRAM\n");
   return 0;
}

... example output:

$ ./a.out
WORKING CODE:
MyDataStructure CTOR 0x1015c4000
iter returned 1000 from 0x1015c4000
iter returned 1001 from 0x1015c4000
iter returned 1002 from 0x1015c4000
iter returned 1003 from 0x1015c4000
iter returned 1004 from 0x1015c4000
iter returned 1005 from 0x1015c4000
iter returned 1006 from 0x1015c4000
iter returned 1007 from 0x1015c4000
iter returned 1008 from 0x1015c4000
iter returned 1009 from 0x1015c4000

BROKEN CODE:
MyDataStructure CTOR 0x7ff7be93d198
MyDataStructure DTOR 0x7ff7be93d198
iter returned -1 from 0x7ff7be93d198
iter returned -1 from 0x7ff7be93d198
iter returned -1 from 0x7ff7be93d198
iter returned -1 from 0x7ff7be93d198
iter returned -1 from 0x7ff7be93d198
iter returned -1 from 0x7ff7be93d198
iter returned -1 from 0x7ff7be93d198
iter returned -1 from 0x7ff7be93d198
iter returned -1 from 0x7ff7be93d198
iter returned -1 from 0x7ff7be93d198

END OF PROGRAM
MyDataStructure DTOR 0x1015c4000
Birdsong answered 30/9, 2024 at 23:48 Comment(4)
Is -Weffc++ what you want?Balbur
Since your actual requirement is to prevent copying MyDataStructure, the obvious approach is to prevent MyDataStructure from being copyable. Before C++11, declare its copy constructor as private and don't define it. After C++11, delete (via MyDataStructure(const MyDataStructure &) = delete). Either way, the effect will be a diagnostic when defining GetDataStructureByValue() from compiling (if it is not a friend or a member of MyDataStructure).Spinthariscope
Address sanitizer should be able to detect it at runtime.Carbolated
@Spinthariscope I don't want to disallow copying MyDataStructure entirely though -- there are times when it's reasonable to do, my phrasing in the question notwithstanding -- I only want to avoid the overhead of doing so when it isn't necessary (e.g. when iterating over one)Birdsong
R
8

One easy thing to do is just prevent the iterator from being able to be created with a temporary object. If you add another constructor to MyDataIterator with the following signature

MyDataIterator(MyDataStructure &&) = delete;

Then this constructor will be the one chosen by overload resolution as it is a better match for rvalues. Since it is deleted the compiler will issue an error because you cannot call a deleted constructor. You can see it "working" (generates an error) here in this live example.

Rascally answered 1/10, 2024 at 1:12 Comment(1)
This solution is perfect, thanks! :)Birdsong

© 2022 - 2025 — McMap. All rights reserved.