Move semantics and virtual methods
Asked Answered
P

3

12

In C++11 we are guided in some cases to pass objects by value and in others by const-reference. However, this guideline depends on the implementation of the method, not just on its interface and intended usage by its clients.

When I write an interface, I do not know how it will be implemented. Is there a good rule of thumb for writing method signatures? For example - in the following code fragment, should I use Bar1 or Bar2?

class IFoo
{
public:
    virtual void Bar1(std::string s) = 0;
    virtual void Bar2(const std::string& s) = 0;
};

You can stop reading here if you agree that the correct signature depends on the implementation. Here is an example that shows why I believe so.

In the following example, we should pass the string by value:

class Foo
{
    std::string bar;

    Foo(std::string byValue)
        : bar(std::move(byValue))
    {
    }
};

Now we can instantiate Foo in an efficient manner in all cases:

Foo foo1("Hello world"); // create once, move once
Foo foo2(s); // the programmer wants to copy s. One copy and one move
Foo foo3(std::move(t)); // the programmer does not need t anymore. No copy at all

In other cases we prefer to pass objects by const reference. For example, in the following case we never want to copy/store the argument, just use its methods:

void DoStuff(const std::string& byRef)
{
    std::cout << byRef.length() << std::endl;
}

All possible usages of the above method are already as efficient as possible.

Update

I believe I forgot to show the issues with the const-reference alternative. If the above class Foo was implemented this way:

class Foo
{
    std::string bar;

    Foo(const std::string& byRef)
        : bar(byRef)
    {
    }
};

Then we would have the following results:

Foo foo1("Hello world"); // Here we would have one more copy of the string. It is less efficient.
Foo foo2(s);             // One copy, like before
Foo foo3(std::move(t));  // Irrelevant here.

Alex.

Plagio answered 3/9, 2013 at 14:40 Comment(8)
If you want to change the value after starting the method (such as moving it, for instance) you should pass by value, otherwise, pass by reference. If the object is small (less than a pointer size) then it is often more efficient to pass by value. In either instance, whoever is using your API will interpret the argument the same way: The function won't change the value of the argument i pass it.Scansorial
well obviously you can never know for sure how it's going to be implemented (unless you are always the one doing the implementation), but normally the function's name should give a clue: if the function is called GetStringLength than you can be pretty sure it is not going to copy the argument :PRonda
@MadScienceDreams, you are correct. However I am asking about the case where the function wants read-only access to its argument from its caller's viewpoint (internally it may move a by-value object, but the caller does not know that). The "correct" way to implement such functions from the efficency perspective is different, depending on the implementation. What I am asking is weather there is some universal way to write pure virtual functions that are both efficient and do not depend on the implementation.Plagio
@Ronda You may be correct, but isn't there some trick that I can do? Something with templates? Maybe something else?Plagio
Assuming I don't want to modify the passed variable and it's not a basic data type, I always pass by const reference. If I need to modify it within the function, just make a local copy of the reference.Kellby
@NeilKirk Passing by const-reference can cause more than one copy. See the class Foo in my example. The case of Foo foo("Hello world") would cause the string to be copied twice if it was passed by const-reference (once during construction of the string and once inside the constructor of Foo when the string is copied).Plagio
That's true. Either I don't care about it, or if my program uses lots of string literals, I store them in global constant std::strings to avoid these issues.Kellby
The string is an example. It matters for objects that are created at the call site.Plagio
S
4

There's no "theory of everything" here. You got it right, there's a problem. I remember confronting it myself a while back.

My conclusions started here:

Application vs. Framework/Library Development

If your clients are developers, this job is much harder. Not only is it harder, but there are no clear guidelines. Great framework designers got their prestige because they happened to take risks that paid off. At the same time, in an alternate universe, their risks could have not paid off. That's because appreciating a framework depends on the direction of its growing usage, and subjective opinions which are much harder to reason about than in the application domain.

So there's no clear cut answer in this case. Fortunately, I think you're interested mainly in Application development here. So let's get on to that.

Starting point: We're developing applications

This makes a huge difference. Because we're supposed to have a much better idea of where the system is going, and what kind of code could turn out to be useful. We're not prophets, but at the same time this assumption allows us to give more credit to our intuition, which is based on our knowledge of the requirements, and the needs of our customers (at least as much as we were able to understand).

At this point, we can still divide this into 2 cases:

Abstraction to Implementation

There are cases where it is beneficial, or even necessary, to define abstraction ahead of the implementation. In cases like this, one has to realize that much more research about the problem is required before defining the abstraction properly. For example, is the domain synchronous or asynchronous? Serial or parallel? High or low level? And other much more concrete questions.

Some extreme agilers will have you believe that you can just write some code and fix it later. However, that claim is falsified very easily once reality hits. If you find hope in it, I encourage you to test it yourself and report if you made any significant discovery. My personal experience, and thought that I have tried putting into the matter, suggest that in big projects this approach is very problematic.

The conclusion in this case is that, if you indeed need to define abstraction ahead, then you should already have a very good idea of the implementation. The better idea you have about it, the higher the chance it will succeed in actually being a proper abstraction.

Implementation to Abstraction

This is my default choice. It has been said in many ways. "Frameworks should be extracted", "Extract 'till you drop", and even "Convention over Configuration" has some similarities in concept.

Basically this means that you implement your required components as necessary, but keep a sharp eye on what's going on. The trick here is to look out for chances to abstract in ways that actually benefit you practically in terms of development and maintenance.

This often comes up as a class that does what you want, but more. In which case, you abstract the intersection away into a more general case. You repeat this process as necessary throughout development.

It's important to not get caught up and still keep your feet on the ground. I've seen many abstraction attempts go wrong to a point where there's no way to reason about its name and deduce its intent except reading thousands of lines of code that use it. For example, in the current code base I'm working on, the type which should have been called Image is called BinaryData. All across the code are attempts to treat it as a concrete (Image), and as an abstract concept at the same time.

To sum up

As I always remind myself, the best best practice you can have is to tame known best practices to fit your problem, rather than the other way around. If you can't do that, well, maybe the problem is interesting enough to require further attention, and a bit of original thought.

Sparing answered 3/9, 2013 at 15:6 Comment(3)
What would you suggest for the case of two different implementations of the same abstraction that differ in what would be the better interface in terms of efficiency?Plagio
@Alex The abstraction might be the same in terms of real life concepts, but the rules are different here (you know the old rectangle/square example). Your question is actually too general to have a simple answer. The way I see it, it's a matter of context and trade-offs. Try up one idea - not good? - try another. Repeat until you find something that sounds the best (or least bad). If you're not happy with all of that, try a different conceptualization of the problem all together. Will there be a way to achieve a clean solution? Probably; but no one said the instructions are written in a book.Sparing
If you still want a more concrete answer, please provide a more concrete example, including requirements, etc. Like I said, devising a simple general solution here is virtually impossible.Sparing
F
2

You could also provide an overload for Bar2 that takes an rvalue reference:

class IFoo
{
public:
    virtual void Bar2(const std::string& s) = 0;

    virtual void Bar2(std::string&& s)
    {
        Bar2(s);   // calls the const& overload because s is an lvalue
    }
};

By default, the rvalue reference overload simply calls the const lvalue reference overlad. But if a specific subclass can take advantage of rvalue references, the rvalue reference overload can be overriden.

Firsthand answered 8/9, 2013 at 20:25 Comment(1)
I'd say providing the rvalue-ref version makes sense only if you're "sinking" the value into owned storage (in "Sean Parent" speak). And even then, taking by value is a lot more effective in the presence of many such parameters (because of the explosion of combination of rvalue-ness)Murphree
B
0

I believe it should definitely depend on the implementation. As implied from your question, barring a completely "always-better" signature, the only sensible thing to do is to choose the signature in a manner that optimizes the current implementation. If you write the interface before the code - take an educated guess, and try to maneuver yourself in such a way that you can wait for a first implementation before you commit to the signature.

The operative words here are "first" and "current". What happens if you got it wrong? What happens if at some later stage the signature prevents your code from being optimal? Here's what you can do:

No commitment

If it's soon enough - just change it. It follows from the definition of "no commitment", right?

Committed to API

For a concrete example, assume you chose wrong, and went with this:

virtual void DoStuff(std::string s) = 0;

However, as it turns out, no copying needs to be performed (same as your original DoStuff implementation). Here's what you can do:

// stuff.h
virtual void DoStuff_Optimized(const std::string & s);
virtual void DoStuff(std::string s);

// stuff.cc
virtual void DoStuff_Optimized(const std::string & s);
{
    // Fast implementation of DoStuff, no copying necessary
    std::cout << s.length() << std::endl;
}

virtual void DoStuff(std::string s)
{
    DoStuff_Optimized(s);
}

Existing clients will get inferior performance. New clients can use the Optimized version.

Committed to ABI

There might be nothing you can do at this point, unfortunately. However, if you're careful, you might be able to follow the "Committed to API" action. (In particular, my example will not preserve ABI compatibility).

Blip answered 8/9, 2013 at 19:59 Comment(3)
The problem happens when you have two implementations of the same interface that have different "optimal" signatures and both are used extensively in the code. For example, in a vector of pointers to base around half of the elements would like to get arguments by value and the other half by const reference for optimal performance.Plagio
So? The API documentation (the .h file) should make it clear which is optimal. You, as the API designer, make that decision. There should be no ambiguity about which is the optimal version.Blip
I will try to explain again - One interface A. Two implementations - B and C. The interface A's method signatures are good for B and are bad for C. So documenting something in B or C's header files does not help. I still have only one interface, and it does not serve both implementations well in terms of efficiency. If I have a vector of A*, half of which are elements of type B and half of which are elements of type C - the performance might be bad for half of the vector. The question - is there a way to have only one interface and still have good performance?Plagio

© 2022 - 2024 — McMap. All rights reserved.