taking over memory from std::vector
Asked Answered
T

3

26

I use an external library which operates on large quantities of data. The data is passed in by a raw pointer, plus the length. The library does not claim ownership of the pointer, but invokes a provided callback function (with the same two arguments) when it is done with the data.

The data gets prepared conveniently by using std::vector<T>, and I'd rather not give up this convenience. Copying the data is completely out of the question. Thus, I need a way to "take over" the memory buffer owned by an std::vector<T>, and (later on) deallocate it in the callback.

My current solution looks as follows:

std::vector<T> input = prepare_input();
T * data = input.data();
size_t size = input.size();
// move the vector to "raw" storage, to prevent deallocation
alignas(std::vector<T>) char temp[sizeof(std::vector<T>)];
new (temp) std::vector<T>(std::move(input));
// invoke the library
lib::startProcesing(data, size);

and, in the callback function:

void callback(T * data, size_t size) {
    std::allocator<T>().deallocate(data, size);
}

This solution works, because the standard allocator's deallocate function ignores its second argument (the element count) and simply calls ::operator delete(data). If it did not, bad things could happen, as the size of the input vector might be quite a bit smaller than its capacity.

My question is: is there a reliable (wrt. the C++ standard) way of taking over the buffer of std::vector and releasing it "manually" at some later time?

Transit answered 26/11, 2014 at 23:59 Comment(24)
You'll need to take over the entire vector.Canaliculus
Would be nice if vector had a detach function... but it doesn'tFogbound
@T.C.: but I have nowhere to store it -- the input production and deallocation happen in two separate parts of the progamTransit
I don't understand the need for the aligned storage. Why not just unique_ptr<vector<T>> temp(new vector<T>(move(input)));? Also, your solution only works if T is a trivially destructible type, otherwise you'll need to call allocator<T>::destroy on each element. To answer your question, there's no easy way taking over the memory from a vector, you might be able to pull something off using a custom allocator, but I'd just stick to the current solution.Selfemployed
possible duplicate of Wrap raw data in std container like array, with runtime sizePickel
Perhaps instead of preparing the input into vector, you could make your own vector-like class that does have a detach function.Fogbound
Sigh - another case of libraries using bad callback signatures. If the callback signature was void (*callback)(T * data, size_t size, void * user_data) and startProcessing(T* data, size_t size, void * userdata) you'd have an easy path to a solution.Excise
(a) your existing solution seems correct to me, and (b) is there any reason why you have to deallocate in the callback, instead of just using the original vector and clearing it after processing finishes?Fogbound
@Praetorian: I used stack storage just to avoid heap manipulations (a habit of mine), of course it had to be aligned appropriately, otherwise the move constructor could crash; and yes, you are right, I forgot to mention the destructors (though, as I know the number of objects, these do not pose a problem)Transit
@MichaelAnderson this could probably be arranged anyway (e.g. static variable)Fogbound
@MattMcNabb: (a) I would rather avoid having it break in a couple of updates to the STL implementation; (b) it happens in a different thread, at an unspecified moment in time (whenever the library is done with it)Transit
@GrzegorzHerman Are you letting the temp array go out of scope? Even if that works in practice, surely that's undefined behavior? (I'm asking, I don't know the answer)Selfemployed
@GrzegorzHerman AFAICS it doesn't rely on any non-standard behaviourFogbound
In a different thread? Now that is a can of worms.Fogbound
@MattMcNabb: why is letting temp out of scope undefined behaviour (instead of just the destructor of whatever was placed there not being called)? The documentation of std::allocator<T>::deallocate states, that the second argument should be exactly the one passed to allocate -- surely violating the specification is non-standard.Transit
OK, fair enough. Perhaps you can look into the option of having the vector be static, and having the callback clear the vector (no placement-new required in this case)Fogbound
@MattMcNabb: threading is required by the processing library (I can start processing possibly multiple data packs in parallel, and do something useful in the meantime) -- but it should not matter hereTransit
@MattMcNabb: static in what? remember, there might be many of these being processed at the same time...Transit
@MattMcNabb Those function signature avoids the need for global or static variables, both of which will cause you pain in the long run. Especially if you have multiple calls from multiple threads.Excise
in some kind of static container, e.g. maybe a listFogbound
@GrzegorzHerman My opinion is that you need to figure out a place where you can stash the vector in some (synchronized) container that can be accessed from both places. Then there's no need for what you're doing above. And regardless of whether letting temp go out of scope is UB, you're already breaking allocator::deallocate's contract by passing it a size that's not necessarily the same as that passed to allocator::allocateSelfemployed
@Praetorian: that's exactly why I am asking this questionTransit
Have you considered having your function return unique_ptr<T[]> instead of vector<T>?Monkey
I don't understand what is the problem. As long as vector doesn't go out of scope. Something that you have to control by the code flow. You can do lib::startProcesing(input.data(), input.size); as longs as that function promises not to reallocate or do strange things with its pointer. Who is calling callback BTW?Terri
E
5

You can't take ownership of the memory from a vector, but you can solve your underlying problem another way.

Here's how I'd approach it - its a bit hacky because of the static global variable and not thread safe, but it can be made so with some simple locking around accesses to the registry object.

static std::map<T*, std::vector<T>*> registry;
void my_startProcessing(std::vector<T> * data) {
  registry.put(data->data(), data);
  lib::startProcesing(data->data(), data->size());
}

void my_callback(T * data, size_t length) {
  std::vector<T> * original = registry.get(data);
  delete original;
  registry.remove(data);
}

Now you can just do

std::vector<T> * input = ...
my_startProcessing(input);

But watch out! Bad things will happen if you add/remove elements to the input after you've called my_startProcessing - the buffer the library has may be invalidated. (You may be allowed to change values in the vector, as I believe that will write through the to data correctly, but that will depend on what the library allows too.)

Also this doesn't work if T=bool since std::vector<bool>::data() doesn't work.

Excise answered 27/11, 2014 at 0:33 Comment(1)
Looks good. If I can find no way to avoid global variables, I will sprinkle this with a bit of std::mutex and std::unique_ptr and it should be fine. Thanks!Transit
S
0

You could create custom class build over a vector.

Key point here is to use move semantics in SomeData constructor.

  • you're getting prepared data without copying (note that source vector will be cleared)
  • data will be correctly disposed by thisData vector destructor
  • source vector can be disposed with no issue

Since underlying datatype is going to be array you can calculate start pointer and a data size (see SomeDataImpl.h below):

SomeData.h

#pragma once
#include <vector>

template<typename T>
class SomeData
{
    std::vector<T> thisData;

public:
    SomeData(std::vector<T> && other);

    const T* Start() const;
    size_t Size() const;
};

#include "SomeDataImpl.h"

SomeDataImpl.h

#pragma once

template<typename T>
SomeData<T>::SomeData(std::vector<T> && otherData) : thisData(std::move(otherData)) { }

template<typename T>
const T* SomeData<T>::Start() const {
    return thisData.data();
}

template<typename T>
size_t SomeData<T>::Size() const {
    return sizeof(T) * thisData.size();
}

Usage example:

#include <iostream>
#include "SomeData.h"

template<typename T>
void Print(const T * start, size_t size) {
    size_t toPrint = size / sizeof(T);
    size_t printed = 0;

    while(printed < toPrint) {
        std::cout << *(start + printed) << ", " << start + printed << std::endl;
        ++printed;
    }
}

int main () {
    std::vector<int> ints;
    ints.push_back(1);
    ints.push_back(2);
    ints.push_back(3);

    SomeData<int> someData(std::move(ints));
    Print<int>(someData.Start(), someData.Size());

  return 0;
}
Spam answered 27/11, 2014 at 14:55 Comment(0)
U
0

You can't do this in any kind of portable way, but you CAN do it in a way that will probably work on most C++ implementations. This code seems to work after a quick test on VS 2017.

#include <iostream>

#include <vector>

using namespace std;

template <typename T>
T* HACK_stealVectorMemory(vector<T>&& toStealFrom)
{
    // Get a pointer to the vector's memory allocation
    T* vectorMemory = &toStealFrom[0];

    // Construct an empty vector in some stack memory using placement new
    unsigned char buffer[sizeof(vector<T>)];
    vector<T>* fakeVector = new (&buffer) vector<T>();

    // Move the memory pointer from toCopy into our fakeVector, which will never be destroyed.
    (*fakeVector) = std::move(toStealFrom);

    return vectorMemory;
}

int main()
{
    vector<int> someInts = { 1, 2, 3, 4 };
    cout << someInts.size() << endl;

    int* intsPtr = HACK_stealVectorMemory(std::move(someInts));

    cout << someInts.size() << endl;

    cout << intsPtr[0] << ", " << intsPtr[3] << endl;

    delete intsPtr;
}

Output:

4
0
1, 4
Ufo answered 12/9, 2019 at 0:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.