Confusion about Copy-On-Write and shared_ptr
Asked Answered
T

2

8

I have searched the web and read through the Boost documentation about shared_ptr. There is a response on SO that says that shared_ptr for Copy-On-Write (COW) sucks and that TR! has removed it from the string libraries. Most advice on SO says to use shared_ptr rather than regular pointers.

The documentation also talks about using std::unique() to make a COW pointer, but I haven't found any examples.

Is the talk about having a smart pointer that performs COW for you or about having your object use a new shared_ptr to a cloned object then modifying the cloned object?

Example: Recipes & Ingredients

struct Nutrients;

struct Ingredient
{
    Ingredient(const std::string& new_title = std::string(""))
        : m_title(new_title)
        { ; }
    std::string m_title;
    Nutrients   ing_nutrients;
};

struct Milk : public Ingredient
    : Ingredient("milk")
{ ; }

struct Cream : public Ingredient
    : Ingredient("cream")
{ ; }

struct Recipe
{
    std::vector< boost::shared_ptr<Ingredient> > m_ingredients;
    void append_ingredient(boost::shared_ptr<Ingredient> new_ingredient)
    {
        m_ingredients.push_back(new_ingredient);
        return;
    }
    void replace_ingredient(const std::string& original_ingredient_title,
                            boost::shared_ptr<Ingredient> new_ingredient)
    {
        // Confusion here
    }
};

int main(void)
{
    // Create an oatmeal recipe that contains milk.
    Recipe  oatmeal;
    boost::shared_ptr<Ingredient> p_milk(new Milk);
    oatmeal.add_ingredient(p_milk);

    // Create a mashed potatoes recipe that contains milk
    Recipe  mashed_potatoes;
    mashed_potatoes.add_ingredient(p_milk);

    // Now replace the Milk in the oatmeal with cream
    // This must not affect the mashed_potatoes recipe.
    boost::shared_ptr<Ingredient> p_cream(new Cream);
    oatmeal.replace(p_milk->m_title, p_cream);

    return 0;
}

The confusion is how to replace the 'Milk' in the oatmeal recipe with Cream and not affect the mashed_potatoes recipe.

My algorithm is:

locate pointer to `Milk` ingredient in the vector.
erase it.
append `Cream` ingredient to vector.

How would a COW pointer come into play here?

Note: I am using MS Visual Studio 2010 on Windows NT, Vista and 7.

Trochal answered 5/6, 2011 at 19:16 Comment(5)
is this multi-threaded? if so, be aware the COW and multithreading can yield unexpected results gotw.ca/publications/optimizations.htmFascinate
The initial version is not multi-threaded. However, since I am using a database, multi-threading may be used for the database and the GUI.Trochal
relevant: #2350371Fascinate
Are you using C++ox? I have a hunch this can be implemented with one of those nifty rvalue reference thingies. Just a hunch, though. Might be worth investigating. open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2027.htmlFascinate
Just a mention about alternate meanings of Copy On Write. When I read the question title I was thinking about how shared_ptr sucks for copy-on-write memory mappings such as those created by fork() in Unix. Whenever a shared_ptr reference count is bumped its memory page becomes unshared, adding to the system's real memory usage.Circumscription
L
15

There are several questions bundled into one here, so bear with me if I don't address them in the order you would expect.

Most advice on SO says to use shared_ptr rather than regular pointers.

Yes and No. A number of users of SO, unfortunately, recommend shared_ptr as if it were a silver bullet to solve all memory management related issues. It is not. Most advice talk about not using naked pointers, which is substantially different.

The real advice is to use smart managers: whether smart pointers (unique_ptr, scoped_ptr, shared_ptr, auto_ptr), smart containers (ptr_vector, ptr_map) or custom solutions for hard problems (based on Boost.MultiIndex, using intrusive counters, etc...).

You should pick the smart manager to use depending on the need. Most notable, if you do not need to share the ownership of an object, then you should not use a shared_ptr.

What is COW ?

COW (Copy-On-Write) is about sharing data to "save" memory and make copy cheaper... without altering the semantic of the program.

From a user point of view, whether std::string use COW or not does not matter. When a string is modified, all other strings are unaffected.

The idea behind COW is that:

  • if you are the sole owner of the data, you may modify it
  • if you are not, then you shall copy it, and then use the copy instead

It seems similar to shared_ptr, so why not ?

It is similar, but both are meant to solve different problems, and as a result they are subtly different.

The trouble is that since shared_ptr is meant to function seamlessly whether or not the ownership is shared, it is difficult for COW to implement the "if sole owner" test. Notably, the interaction of weak_ptr makes it difficult.

It is possible, obviously. The key is not to leak the shared_ptr, at all, and not to use weak_ptr (they are useless for COW anyway).

Does it matter ?

No, not really. It's been proved that COW is not that great anyway. Most of the times it's a micro optimization... and a micro pessimization at once. You may spare some memory (though it only works if you don't copy large objects), but you are complicating the algorithm, which may slow down the execution (you are introducing tests).

My advice would be not to use COW. And not to use those shared_ptr either.


Personnally, I would either:

  • use boost::ptr_vector<Ingredient> rather than std::vector< boost::shared_ptr<Ingredient> > (you do not need sharing)
  • create a IngredientFactory, that would create (and manage) the ingredients, and return a Ingredient const&, the Factory should outlive any Receipt.

EDIT: following Xeo's comment, it seems the last item (IngredientFactory) is quite laconic...

In the case of the IngredientFactory, the Receipt object will contain a std::vector<Ingredient const*>. Note the raw pointer:

  • Receipt is not responsible for the memory, but is given access to it
  • there is an implicit warranty that the object pointed to will remain valid longer than the Receipt object

It is fine to use raw (naked) pointers, as long as you treat them like you would a reference. You just have to beware of potential nullity, and you're offered the ability to reseat them if you so wish -- and you trust the provider to take care of the lifetime / memory management aspects.

Lorenzalorenzana answered 6/6, 2011 at 8:13 Comment(6)
+1 Thanks for a well-reasoned and accurate assessment of the problem.Davidson
+1 for thoroughness, though I'll offer that I've found some interesting applications for copy-on-write for GUI apps. I force worker threads doing background tasks to pay for the copy of data structures while the GUI thread proceeds...kinda slick: hostilefork.com/thinker-qtCrotty
Very nice answer, but you might also want to add that naked pointers are perfectly fine (imho), if you pass them to a function that isn't concerned about ownership at all for example.Johannessen
@Hostile Fork: There are cases where COW makes sense, however it is an optimization trick, and such tricks should only be used when it has been measured they were beneficial :) Thanks for the example (though I don't use QT so I just skimmed over it).Lorenzalorenzana
@Xeo: Thanks. It was implied in the IngredientFactory idea, I've made it explicit :)Lorenzalorenzana
@Xeo: no problem :p I just wanted you to know I had taken your remark into account (since you bothered writing it down), and giving you the opportunity to read it.Lorenzalorenzana
D
1

You have nothing to worry about. Each Recipe object has its own vector, so modifying one won't affect the other, even though both of them happen to contain pointers to the same objects. The mashed-potatoes recipe would only be affected if you changed the contents of the object that p_milk points at, but you're not doing that. You're modifying the oatmeal.m_ingredients object, which has absolutely no relation to mashed_potatoes.m_ingredients. They're two completely independent vector instances.

Deprecate answered 5/6, 2011 at 19:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.