Is it undefined behavior to pass a pointer to an unconstructed streambuf object to the ostream constructor?
Asked Answered
B

2

9

Question

Does the following program have undefined behavior?

#include <iostream>          // std::{ostream, streambuf}

// The streambuf ctor is protected so we need a wrapper to create one.
struct mystreambuf : public std::streambuf {};

extern mystreambuf sb;       // Not yet constructed.
std::ostream os(&sb);        // Passing "invalid" pointer here?  UB?
mystreambuf sb;              // Now it is constructed.

int main() { return 0; }

It invokes the ostream constructor, passing a pointer to a streambuf object whose lifetime has not yet begun (basic.life p1). Does this constitute undefined behavior?

Attempted answer

If streambuf were a user-written class, then class.cdtor p1 would govern, which says:

For an object with a non-trivial constructor, referring to any non-static member or base class of the object before the constructor begins execution results in undefined behavior. [...]

This language, and its accompanying example, make it clear that merely taking the address of an unconstructed object is not undefined. As far I can tell, passing that address as a pointer to a user-written function that only stores its value and tests it against nullptr is also not undefined.

But streambuf is a library class, so instead res.on.arguments p1 applies, which says, in part:

If an argument to a function has an invalid value (such as a value outside the domain of the function or a pointer invalid for its intended use), the behavior is undefined.

But what constitutes an "invalid value"? Presumably we have to determine the "intended use" by reading the specification of the called function. The constructor spec ostream.cons p1 says in part:

Effects: Initializes the base class subobject with basic_ios<charT, traits>::init(sb) ([basic.ios.cons]).

The spec for init basic.ios.cons p4 says:

Postconditions: The postconditions of this function are indicated in Table 127.

where Table 127 has two rows that mention sb:

Element    Value
-------    -----
rdbuf()    sb
rdstate()  goodbit if sb is not a null pointer, otherwise badbit.

So, at first glance, this would seem to suggest that sb is only stored (so that rdbuf() can return it) and tested for being nullptr; and that these together comprise its "intended use". Since both of these would be legal for user-written code to do, it is legal to pass the pointer in question, so the program has defined behavior.

But Table 127 is merely a list of postconditions. It does not definitively assert that nothing else is in the scope of "intended use". For that, it would seem necessary to exhaustively review everything that basic_ostream and its subclasses potentially do with sb.

While attempting to do so, I find imbue at basic.ios.members p9:

Effects: Calls ios_base::imbue(loc) and if rdbuf() != 0 then rdbuf()->pubimbue(loc).

Clearly, calling rdbuf()->pubimbue(loc) before the object pointed to by rdbuf() is constructed is undefined. Do we call imbue? Not explicitly of course, and there's no particular reason to suspect an indirect call either, but the existence of this behavior arguably puts it in scope of the "intended use" of the pointer passed to the constructor, since eventually it could be used this way. Furthermore, would it necessarily be non-conforming for an implementation to call imbue on its own during the ostream constructor? I don't see why it would be, and if an implementation is free to call imbue in the constructor, then clearly we have undefined behavior. And there could be other methods that suggest other usages, as my survey was by no means complete.

Now, in a comment on an answer to a related question, indi observes that the Clang implementation of std::basic_fstream does pass a pointer to an unconstructed member object to the iostream constructor at fstream:1419:

  basic_filebuf<char_type, traits_type> __sb_;
};

template <class _CharT, class _Traits>
inline basic_fstream<_CharT, _Traits>::basic_fstream() : basic_iostream<char_type, traits_type>(&__sb_) {}

But this example is not definitive because (1) it could be a mistake, and (2) the library implementation is generally allowed to do things that would be undefined in user code. Nevertheless, it is at least weak evidence that the Clang developers think the practice does not have undefined behavior, as they have no reason in this case to write code that relies on the library's license to bend the rules, since it would be a trivial change to instead pass nullptr to the constructor and then in the body call init with the address of the (now fully constructed) member object.

Ultimately, it seems to me that the language specification is ambiguous, as it relies on the terms "invalid value" and "intended use" which are not clearly specified. But perhaps someone can identify a provision I have missed or an error in my interpretations.

Related questions

While researching this, I came across some existing questions that seemed related. The question How to inherit from std::ostream? has three relevant answers:

  • The (highest-voted) answer by Ben was specifically edited to avoid the potential problem by ensuring the streambuf is constructed before passing its address.

  • A more recent answer by mach6 also goes out of its way to avoid passing the unconstructed object's pointer, this time by initializing the ostream with nullptr (albeit by using a non-standard constructor that only GNU libc++ has, but is easily replaced with a standard one) and then calling init afterward.

  • The answer by Henrik Heino passes the not-yet-constructed pointer. But this answer does not claim to be correct, and has one comment that says passing the pointer that way is incorrect.

From these answers and comments, I infer that quite a few knowledgeable people believe that the example at the top of this question has undefined behavior.

Meanwhile, the question Is it dangerous to pass a pointer to a subobject that is not constructed yet to a constructor of another subobject during the object construction? is very nearly the same as mine, but is marred by having some important parts of the example code missing, and involves an extraneous AnotherClass that further muddies the question. The answer by aschepler seems to say that the practice is ok in general, but not in the OP's case because of AnotherClass, but it only reasons as if all of the code were written by the user, ignoring the library aspect.

Finally, the question Is it safe to pass an unconstructed buffer to the constructor of std::ostream? is essentially the same as mine--I'm asking a duplicate! Why? In short, that question has no answers, and I think the additional research in my question makes it more likely mine can be answered, so I'm effectively submitting this with the intention of replacing that one. I asked a meta question about whether asking this duplicate is acceptable, and the consensus seems to be that is.


I've accepted Chris Dodd's answer, but I want to elaborate a little on it, so this is a restatement of that answer in my own words.

The original example has undefined behavior because, in this line:

std::ostream os(&sb);        // Passing "invalid" pointer here?  UB?

the expression &sb has type mystreambuf*, but is being passed to a constructor that accepts std::streambuf*, and therefore must undergo derived-to-base conversion. That conversion, applied to a pointer to an unconstructed object with non-trivial constructor, has undefined behavior since it is a "[reference] to any [...] base class of the object", which is prohibited by class.cdtor p1.

The example in that section further clarifies. Quoting the key lines from it:

struct X { int i; };
struct Y : X { Y(); };                  // non-trivial
struct A { int a; };
struct B : public A { int j; Y y; };    // non-trivial

extern B bobj;
A* pa = &bobj;                          // undefined behavior: upcast to a base class type
B bobj;                                 // definition of bobj

Moreover, this means that not only is the specific example in the question undefined, but it is in general undefined to do what the question title says, namely to "pass a pointer to an unconstructed streambuf object to the ostream constructor". That is because the std::streambuf constructor is protected, so an instance must always be a proper base class subobject, and therefore the only way to obtain a std::streambuf* is with a derived-to-base conversion.

That implies that the code quoted from the Clang libc++ would have undefined behavior if it were user code, and I have filed Issue #93307 against Clang about that.

Burrill answered 22/5 at 4:1 Comment(18)
Taking address of uninitialized object and then passing that pointer by value is allowed afaik.Carlocarload
@Carlocarload Yes, that is what I said right at the start of the "Attempted answer" section. But as the rest of the question explains, that observation alone does not, I think, answer the question.Burrill
I think you're overcomplicating things here. I mean, we're allowed to pass an uninitiailzed object by reference. Similarly, we're also allowed to pass a pointer by value to an uninitialized object.Carlocarload
Initialization order fiasco will haunt you down eventually. Static duration objects with dynamic initialization would better be Meyer singletons. Take care not to create cyclic init dependency.Couvade
this is totally fine: godbolt.org/z/KqT7a9vTd but this is not : godbolt.org/z/sjno8Ea81. The pointer is ok, the cruical quesiton is whether it is dereferences before the pointee comes into existance.Kusin
the crucial question is whether the pointer is dereferenced. None of the proposed dupe addressed this for the case of the std::ostream constructor.Kusin
@Kusin this answer does talk about accessing the object. It can be a dupe.Carlocarload
This is about a library function. This is sufficient to make it different from the dupes, as stated already in the question.Sondra
@Carlocarload still isn't talking about std::ostream. This question is specifically about what the standard says happens or does not happen in basic_ostream::basic_ostream(basic_streambuf<charT, traits>* sb)Tiana
@Carlocarload If you can find a dupe about that specific constructor then it's a dupe. The answer Jerry Coffin wrote talks about what the standard requires of that specific constructor.Tiana
I don't think the question titile should be interpreted too literally. If we were to interpret it literally as you do, then I agree those are dupes.Sondra
Nobody is disagreeing with that, but that is not we suppose OP to be asking.Sondra
@Carlocarload we know that already. The question boils down to whether the ostream constructor does dereference the pointer.Kusin
@Carlocarload I find the question title to be accurate. "Is it undefined behaviour to do X" asks whether there is any undefined behaviour in the evaluation of X, not just the directly visible partsTiana
@Carlocarload NO those dupes do not solve OP's problem. Looking at, and understanding, the definition of basic_ostream::basic_ostream(basic_streambuf<charT, traits>* sb) solves OPs problemTiana
@Kusin "please do not spam questions with such discussion..." It takes more than 1 person to have a discussion. You're (and others) as much as to blame as they're for the discussion here. That is you're also spamming the comment section with such discussion.Hemidemisemiquaver
There is no apparent reason for init to dereference its argument, and available implementations don't do that, however nothing in the standard seems to forbid it.Dabster
@Kusin You also have the option to delete your comments. If you don't like to spam the comment section then you can just delete your comments as well. Every user can delete their comment whenever they want. Blaming that your comments no longer makes sense isn't sensible because you can just delete your comments. If you're not deleting them consciously then you're admitting that you're spamming the section as well.Hemidemisemiquaver
B
3

The language you quoted

For an object with a non-trivial constructor, referring to any non-static member or base class of the object before the constructor begins execution results in undefined behavior. [...]

would seem to indicate this is undefined behavior -- you're referring to the base class (std::streambuf) of an object before the constructor has run. What happens in the ostream constructor is irrelevant.

Broadbrim answered 22/5 at 23:19 Comment(0)
B
1

So, @JerryCoffin’s answer is correct, but there is an objection to it on the grounds that while the standard clearly specifies what basic_ios::init() does, it doesn’t specify what it doesn’t do. So (the objection goes), while the standard asserts that the only things basic_ios::init() does with the passed pointer are compare it to nullptr and store it… it might also dereference it, which would trigger UB in the situation described.

Okay, let’s assume that logic makes sense.

So, because basic_ios::init() “might” dereference the pointer, and because the basic_ostream constructor calls basic_ios::init(), we can’t pass a pointer to a member. So we can’t do this:

class myostream :
    public std::ostream
{
    std::streambuf _buf;

public:
    myostream() : std::ostream{&_buf} {}

    // other stuff...
};

Because although the standard specifies that the postconditions of the ostream conductor (indirectly/transitively) just compare the pointer passed to nullptr and keep a copy… the postconditions are not necessary exhaustive. So it might dereference the pointer for some unknown reasons.

If so, that would be UB. So how would we avoid that?

The solution offered looks like this:

class myostream :
    private std::streambuf,
    public std::ostream
{
public:
    myostream() : std::ostream{this} {}

    // other stuff...
};

So, great! Problem solved, right?

Well, no.

Because, you see, the standard doesn’t say that the ostream constructor or basic_ios::init() don’t delete the pointer passed.

basic_ios::init() might do this:

auto basic_ios::init(streambuf* p_buf)
{
    // do all the stuff init() is specified to do, and then...

    delete p_buf;
}

Why not? The postconditions don’t say explicitly that the stream buffer pointed to by the argument won’t be deleted. And that doesn’t contradict the postconditions.

Or maybe it does this:

auto basic_ios::init(streambuf* p_buf)
{
    // do all the stuff init() is specified to do, and then...

    p_buf->~streambuf();
    
    ::new (static_cast<void*>(p_buf)) streambuf{};
}

Again, why not? That wouldn’t literally contradict the precise wording of the contract of basic_ios::init() as spelled out in the standard. So it could happen, right?

If you suppose that basic_ios::init() is free to do anything with the pointer that it doesn’t explicitly say it won’t, then your clever inheritance strategy won’t work either. In fact… literally nothing will work. If basic_ios::init() is allowed to do LITERALLY ANYTHING with the pointer you pass it—so long as it doesn’t contradict the explicit wording of the contract—then you can’t assume anything about the stream buffer pointer you pass to it. You can’t assume it won’t be destroyed. You can’t assume it will be destroyed. You can’t assume it won’t be overwritten.

So basically, basic_ios::init() is just impossible to use safely. Which means it is impossible to create our own output streams, because we must call basic_ios::init(), directly or indirectly, at some point (before the destructor, or any member functions).

So, there’s your conclusion. It is just impossible to create your own custom streams or stream buffers, because the standard writers didn’t explicitly rule out every asininely imaginable possible contingency for what might happen with that pointer.

Or… maybe… our logic went off the rails somewhere.

Look, the people writing the standard are not doing it for the sake of a group of D&D players who get off on picking apart the micro-semantics of every single rule clause looking for a way to game the system. The committee has neither the time nor the patience to cater to every absurd rule-twisting fanatic’s desire to find loopholes. They will include as much explicit detail as is necessary for reasonable implementers to produce implementations that behave consistently with each other, and with the understanding that reasonable readers of the standard will interpret from it.

So let’s approach this like reasonable people.

The standard specifies what basic_ios::init() does with the pointer passed. It says nothing about the pointer being dereferenced, not even a non-normative note suggesting that might be the case.

Yes, it does not explicitly state that the pointer won’t be dereferenced (or deleted, or anything else). But consider this: As I pointed out in another comment, Clang’s libc++ does basically what the first code block above does. If there were a reasonable interpretation of the contract of basic_ios::init() that implied the pointer might be dereferenced… wouldn’t somebody have noticed the problem in the decade or so that libc++ has been in widespread use? Don’t you think that, maybe, a sanitizer or two might have noticed?

And, out of curiosity, I also checked the Microsoft standard library source code. Yup, it does the same thing: passes a pointer to a stream buffer data member. That’s two major, widely-used standard libraries. I don’t know how long that particular standard library has been in use, but again… don’t you think somebody would have raised the issue by now if it were a reasonable interpretation of the standard that that stream buffer pointer might be de-referenced before the stream buffer is constructed?

(And I can’t dig up my copy of Langer & Kreft right now, but I’m pretty sure they do the same thing, too.)

Once again: be reasonable. IOstreams has been in the standard since 1998, and it was a widely used library even before that, going back as far as 1984. The wording has been pored over, revised, and studied in dozens and dozens of defect reports. If “it doesn’t say it doesn’t dereference” were a reasonable interpretation of the standard’s definition of basic_ios::init()… don’t you think someone would have done something about that sometime in the last ~30–40 years? Don’t you think someone working on or with the Microsoft standard library OR Clang’s standard library—or one of the many, many people who have made their own custom streams (including the people making new standard custom streams, like in networking proposals)—would have pointed out the issue?

Be reasonable. The standard doesn’t have to explicitly say the pointer won’t be dereferenced, because that would be a pants-crappingly stupid thing to do to a pointer that you haven’t specified must point to a valid stream buffer. Everything else in basic_ios follows that reasoning: the destructor also doesn’t delete the pointer. Indeed, if basic_ios::init() were allowed to dereference the pointer, that would wildly complicate the process of making a custom stream. And for what? For what gain? Why would the IOStreams library be better if it did allow for basic_ios::init() to dereference the stream pointer? How would that compare to the many ways it would be massively worse if you couldn’t assume it was safe to pass a pointer to a member stream buffer?

Conclusion: The fact that the standard wording doesn’t explicitly state… that the things it explicitly states it does with the stream buffer pointer are the only things it does with it… does not imply it may do any random thing with the stream buffer pointer. Especially things that might create UB if they were done unexpectedly. If it required a pointer to a valid stream buffer, it would say so. It does not, and instead lists a bunch of things that don’t require a valid stream buffer.

Suggestion: Don’t treat the standard like a riddle and pick through its wording looking for traps.

CLEARLY the intention is for basic_ios::init() to just compare the pointer to nullptr and keep a copy. It makes no damn sense to not have that be the implication and instead require stream implementers to resort to gymnastics like multiple inheritance (or dynamic allocation followed by rdbuf() to retrieve the pointer later, or other wacky, circuitous ideas). I mean… why? Why would you design the library like that? That would be absurd. Why would you so unnecessarily hamstring the obvious and safest way to implement a stream with an underlying stream buffer?

tl;dr: 1) @JerryCoffin is correct that the behaviour is defined, by reasonable implication from the standard wording. 2) The first code block is fine, and you can pass a pointer to an uninitialized stream buffer to basic_ios::init(). 3) Two major standard libraries work that way, and have done so for decades without any concern raised. 4) There are no rhetorical traps in the C++ standard.

Bhakti answered 23/5 at 0:57 Comment(9)
I'm sympathetic to the "please be reasonable!" argument, but the way I see it, that argument fails in the case of memset(nullptr, ..., 0): it is absurd for memset to deref its argument for a zero count (no reason to, and historical library code assumed it didn't) but the language requires a valid pointer anyway. The C++ "intended use" phrase is in the same place where C's language that prohibits memset(nullptr) is, so it plays the same role. If memset can be a jerk, why not ostream?Burrill
First, you are misinterpreting what “be reasonable” means. It does not mean “have a heart, I can’t answer everything so be nice to me”. It means “if you come with assumptions or scenarios that defy reason, you cannot expect reasonable answers”. In this case, if you bring assumptions that defy everything else the standard says or implies about what basic_ios::init() does and how it is intended be used… then you can’t expect the standard to entertain you.Bhakti
In other words: Taken as a whole, the design of IOStreams clearly intends for basic_ios::init() to be called in the constructors of derived stream classes. And it clearly intends to be given a pointer to the stream buffer the stream uses. So why would there be a completely unnecessary trap for the plainly obvious use case? That would be absurd, and it would be even more absurd to leave such a dangerous trap go completely un-noted. The standard does not bother to waste word count on absurdity. Thus… be reasonable.Bhakti
Second, yes, it is a perfectly reasonable assumption that if you call memset() with a zero size, it will never derference the memory. So why do we know it’s UB? Because the standard explicitly says so (in multiple ways, if I recall). If it didn’t then you would be able to assume it was safe, because, again, it is a reasonable assumption. In other words memset() “can be a jerk” against reasonable assumptions because the standard removes the need to assume, and outright says it is a jerk. It could also easily say basic_ios::init() is a jerk; it does not.Bhakti
C99 7.1.4p1 explicitly disallows null arguments to library functions. But C++ only has, as far as I know, res.on.arguments p1, which talks about "invalid value" and "intended use". Thus, I think in C++ the only reason memset gets to be a jerk is the license granted by the vaguely worded res.on.arguments. (Or maybe one could claim cstring.syn p1 incorporates C99 7.1.4 by reference but I think that's a stretch.)Burrill
res.on.arguments are requirements for the C++ standard library. memset() is part of the C standard library. The C++ standard’s library policies don’t apply to the C standard library. The C++ standard merely imports or “makes available” the C standard library (modulo a few small changes that are specified, like putting the functions in namespace std)… it doesn’t specify it. So what C++’s res.on.arguments says is irrelevant; you need to read the C standard.Bhakti
In the C standard: There is the library-wide stuff about no nullptr arguments, yes, but the spec for memset() itself speaks of “the object pointed to by {the pointer argument}”. Setting aside C’s loosey-goosey rules about object lifetime, that indisputably means the pointer argument can’t be nullptr. Even if “writes to the first n bytes of the object pointed to” implies no bytes will be written if n is zero, so the pointer need never be dereferenced… there still must be an object being pointed to, because the standard explicitly says so.Bhakti
Ok, you've persuaded me that memset is not reliant on res.on.arguments for its nullptr prohibition, therefore reducing the implied strength of res.on.arguments. I'm still not convinced that's enough to say that an init that dereferences is non-conforming, but it removes one obstacle to that conclusion. However, in this context, I think Chris's answer has rendered that question moot.Burrill
I've accepted Chris's answer, but in my mind the issue of whether std::basic_ios::init is permitted to dereference its argument remains open. Your answer and commentary have advanced my understanding of that topic (+1), even if I don't agree with all of the points made; thank you. It's a bit unfortunate that this facet is irrelevant in this context. I might ask a follow-on question if I can think of a good way to do so; if I do, I'll ping you.Burrill

© 2022 - 2024 — McMap. All rights reserved.