Stealing resources from std::map's keys allowed?
Asked Answered
N

4

15

In C++, is it OK to steal resources from a map that I do not need afterwards anymore? More precisely, assume I have a std::map with std::string keys and I want to construct a vector out of it by stealing the resources of the maps keys using std::move. Note that such a write access to the keys corrupts the internal datastructure (ordering of keys) of the map but I won't use it afterwards.

Question: Can I do this without any problems or will this lead to unexpected bugs for example in the destructor of the map because I accessed it in a way that std::map was not intended for?

Here is an example program:

#include<map>
#include<string>
#include<vector>
#include<iostream>
using namespace std;
int main(int argc, char *argv[])
{
    std::vector<std::pair<std::string,double>> v;
    { // new scope to make clear that m is not needed 
      // after the resources were stolen
        std::map<std::string,double> m;
        m["aLongString"]=1.0;
        m["anotherLongString"]=2.0;
        //
        // now steal resources
        for (auto &p : m) {
            // according to my IDE, p has type 
            // std::pair<const class std::__cxx11::basic_string<char>, double>&
            cout<<"key before stealing: "<<p.first<<endl;
            v.emplace_back(make_pair(std::move(const_cast<string&>(p.first)),p.second));
            cout<<"key after stealing: "<<p.first<<endl;
        }
    }
    // now use v
    return 0;
}

It produces the output:

key before stealing: aLongString
key after stealing: 
key before stealing: anotherLongString
key after stealing: 

EDIT: I would like to do this for the entire content of a large map and save dynamic allocations by this resource stealing.

Neophyte answered 22/2, 2020 at 7:15 Comment(7)
What is the purpose of this "stealing"? To remove the element from the map? Then why not simply do that (erase the element from the map)? Also, modifying a const value is always UB.Iloilo
apparently it will lead to serious bugs!Daffy
@Someprogrammerdude This is for optimization as the strings are moved in the vector and no dynamic allocation is necessary. Maybe I do not need this kind of optimization, but it caught my interest. Actually, I want to return a large vector from a function and I need the temporary map for the construction.Neophyte
Not a direct answer to your question, but: What if you don't return a vector but a range or a pair of iterators? That would avoid copying completely. In any case, you need benchmarks to track progress of the optimization and a profiler to find hotspots.Kaylakayle
Do you know that, statistically speaking, moving a string takes more time than copying it? You'd better just copy the data in this particular case.Marjorie
@Marjorie Do you have any source for this statement. I cant imagine how copying a pointer is more expensive than copying an entire region of memory.Rozanneroze
@SebastianHoffmann it was mentioned on recent CppCon not sure on which talk, tho. The thing is std::string has short-string optimization. Meaning that there is some non-trivial logic on copying and moving and not just pointer exchange and in addition most of the time moving implies copying - lest you deal with rather long strings. The statistical difference was small anyways and in general it surely varies depending on what kind of string processing is performed.Marjorie
I
17

You're doing undefined behavior, using const_cast to modify a const variable. Don't do that. The reason it's const is because maps are sorted by their keys. So modifying a key in-place is breaking the underlying assumption the map is built on.

You should never use const_cast to remove const from a variable and modify that variable.

That being said, C++17 has the solution for your problem: std::map's extract function:

#include <map>
#include <string>
#include <vector>
#include <utility>

int main() {
  std::vector<std::pair<std::string, double>> v;
  std::map<std::string, double> m{{"aLongString", 1.0},
                                  {"anotherLongString", 2.0}};

  auto extracted_value = m.extract("aLongString");
  v.emplace_back(std::make_pair(std::move(extracted_value.key()),
                                std::move(extracted_value.mapped())));

  extracted_value = m.extract("anotherLongString");
  v.emplace_back(std::make_pair(std::move(extracted_value.key()),
                                std::move(extracted_value.mapped())));
}

And don't using namespace std;. :)

Individuate answered 22/2, 2020 at 7:39 Comment(6)
Thanks, I will try this! But are you sure that I cannot do as I did? I mean the map won't complain if I do not call its methods (which I do not do) and maybe the internal ordering is not important in its destructor?Neophyte
The map's keys are created const. Mutating a const object is instant UB, whether or not anything actually accesses them afterwards.Posset
This method has two problems: (1) As I want to extract all elements I do not want to extract by key (inefficient lookup) but by iterator. I have seen that this is also possible, so this is fine. (2) Correct me if I am wrong but for extracting all elements there will be an enormous overhead (for rebalancing the internal tree structure at every extraction)?Neophyte
@Neophyte As you can see on cppreference extract when using iterators as argument has amortized constant complexity. Some overhead is unavoidable, but it will probably not be significant enough to matter. If you have special requirements not covered by this, then you will need to implement your own map satisfying these requirements. The std containers are meant for common general purpose application.They are not optimized for specific use cases.Turgite
@Posset Are you sure that the keys are created const? In this case can you please indicate where my argumentation is wrong.Neophyte
@Neophyte See my answer.Endemic
E
4

Your code attempts to modify const objects, so it has undefined behavior, as druckermanly's answer correctly points out.

Some other answers (phinz's and Deuchie's) argue that the key must not be stored as a const object because the node handle resulted from extracting nodes out of the map allow non-const access to the key. This inference may seem plausible at first, but P0083R3, the paper that introduced the extract functionalities), has a dedicated section on this topic that invalidates this argument:

Concerns

Several concerns have been raised about this design. We will address them here.

Undefined behavior

The most difficult part of this proposal from a theoretical perspective is the fact that the extracted element retains its const key type. This prevents moving out of it or changing it. To solve this, we have provided the key accessor function, which provides non-const access to the key in the element held by the node handle. This function requires implementation "magic" to ensure that it works correctly in the presence of compiler optimizations. One way to do this is with a union of pair<const key_type, mapped_type> and pair<key_type, mapped_type>. The conversion between these can be effected safely using a technique similar to that used by std::launder on extraction and reinsertion.

We do not feel that this poses any technical or philosophical problem. One of the reasons the Standard Library exists is to write non-portable and magical code that the client can’t write in portable C++ (e.g. <atomic>, <typeinfo>, <type_traits>, etc.). This is just another such example. All that is required of compiler vendors to implement this magic is that they not exploit undefined behavior in unions for optimization purposes—and currently compilers already promise this (to the extent that it is being taken advantage of here).

This does impose a restriction on the client that, if these functions are used, std::pair cannot be specialized such that pair<const key_type, mapped_type> has a different layout than pair<key_type, mapped_type>. We feel the likelihood of anyone actually wanting to do this is effectively zero, and in the formal wording we restrict any specialization of these pairs.

Note that the key member function is the only place where such tricks are necessary, and that no changes to the containers or pair are required.

(emphasis mine)

Endemic answered 23/2, 2020 at 13:33 Comment(1)
This is actually a part of the answer to the original question but I can only accept one.Neophyte
N
0

I don't think that const_cast and modification lead to undefined behaviour in this case but please comment whether this argumentation is correct.

This answer claims that

In other words, you get UB if you modify an originally const object, and otherwise not.

So the line v.emplace_back(make_pair(std::move(const_cast<string&>(p.first)),p.second)); in the question does not lead to UB if and only if the string object p.first was not created as a const object. Now note that the reference about extract states

Extracting a node invalidates the iterators to the extracted element. Pointers and references to the extracted element remain valid, but cannot be used while element is owned by a node handle: they become usable if the element is inserted into a container.

So if I extract the node_handle corresponding to p, p continues to live at its storage location. But after extraction, I am allowed to move away the resources of p as in the code of druckermanly's answer. This means that p and hence also the string object p.first was not created as const object originally.

Therefore, I think that the modification of the map's keys does not lead to UB and from Deuchie's answer, it seems that also the now corrupted tree structure (now multiple same empty string keys) of the map does not introduce problems in the destructor. So the code in the question should work fine at least in C++17 where the extract method exists (and the statement about pointers remaining valid holds).

Update

I am now of the view that this answer is wrong. I am not deleting it because it is referenced by other answers.

Neophyte answered 23/2, 2020 at 10:25 Comment(1)
Sorry, phinz, your answer is wrong. I'll write an answer to explain this - it has something to do with unions and std::launder.Endemic
R
-1

EDIT: This answer is wrong. Kind comments have pointed out the mistakes, but I am not deleting it because it has been referenced in other answers.

@druckermanly answered your first question, which said that changing the keys in map by force breaks the orderliness on which map's internal data structure is built (Red-Black tree). But it is safe to use the extract method because it does two thing: move the key out of the map and then delete it, so it does not affect the orderliness of the map at all.

The other question you posed, as to whether it would cause trouble when deconstructing, is not a problem. When a map deconstructs, it will call the deconstructor of each of its elements (mapped_types etc.), and the move method ensures that it is safe to deconstruct a class after it has been moved. So don't worry. In a nutshell, it is the operation of move that ensures that it is safe to delete or reassign some new value to the "moved" class. Specifically for string, the move method may set its char pointer to nullptr, so it does not delete the actual data that was moved when the deconstructor of the original class was called.


A comment reminded me of the point I was overlooking, basically he was right but there is one thing I do not totally agree: const_cast is probably not a UB. const is just a promise between the compiler and us. objects noted as const are still an object, same as those not with const, in terms of their types and representations in the binary form. When the const is cast away, it should behave as if it is a normal mutable class. With respect to move, If you want to use it, you have to pass a & instead of a const &, so as I see it is not a UB, it simply breaks the promise of const and move the data away.

I also did two experiments, using MSVC 14.24.28314 and Clang 9.0.0 respectively, and they yielded the same result.

map<string, int> m;
m.insert({ "test", 2 });
m.insert({ "this should be behind the 'test' string.", 3 });
m.insert({ "and this should be in front of the 'test' string.", 1 });
string let_me_USE_IT = std::move(const_cast<string&>(m.find("test")->first));
cout << let_me_USE_IT << '\n';
for (auto const& i : m) {
    cout << i.first << ' ' << i.second << '\n';
}

output:

test
and this should be in front of the 'test' string. 1
 2
this should be behind the 'test' string. 3

We can see that the string '2' is empty now, but obviously we have broken the orderliness of the map because the empty string should be re-located into the front. If we try to insert, find or delete some specific nodes of the map, it may cause a catastrophe.

Anyway, we may agree on that it is never a good idea to manipulate the inner data of any classes bypassing their public interfaces. The find, insert, remove functions and so forth rely their correctness on the orderliness of the inner data structure, and that is the reason why we should stay away from the thought of peeking inside.

Revulsive answered 22/2, 2020 at 10:30 Comment(5)
"as to whether it would cause trouble when deconstructing, is not a problem." Technically correct, since the undefined behavior (changing of a const value) happened earlier. However, the argument "the move [function] ensures that it is safe to deconstruct [an object of] a class after it has been moved" doesn't hold: You cannot safely move from a const object/reference, since that requires modification, which const prevents. You can try to work around that limitation by using const_cast, but at that point you're at best going into deeply implementation specific behavior, if not UB.Portraiture
@Portraiture Thanks, I overlooked and made a big mistake. If it wasn't you that pointed it out, my answer here may mislead someone else. Actually I should say that the move function takes a & instead of a const&, so if one insists that he moves a key out of a map, he must use const_cast.Revulsive
"objects noted as const are still an object, same as those not with const, in terms of their types and representations in the binary form" No. const objects can be put in read only memory. Also, const enables the compiler to reason about the code and cache the value instead of generating code for multiple reads (which can make a big difference in terms of performance). So the UB caused by const_cast is going to be obnoxious. It may work most of the time, but break your code in subtle ways.Endemic
But here I think that we can be sure that the objects are not put in read only memory because after extract, we are allowed to move from the very same object, right? (see my answer)Neophyte
Sorry, the analysis in your answer is wrong. I'll write an answer to explain this - it has something to do with unions and std::launderEndemic

© 2022 - 2024 — McMap. All rights reserved.