I have a super fast M.2 drive. How fast is it? It doesn’t matter because I cannot utilize this speed anyway. That’s why I’m asking this question.
I have an app that needs a real lot of memory. So much that it won’t fit in RAM. Fortunately it is not needed all at once. Instead it is used to save intermediate results from computations.
Unfortunately the application is not able to write and reads this data fast enough. I tried using multiple reader and writer threads but it only made it worse (later I read that it is because of this).
So my question is: Is it possible to have truly asynchronous file IO in C++ to fully exploit those advertised gigabytes per second? If it is than how (in a cross platform way)?
You could also recommend a library that’s good with tasks like that if you know one because I believe that there is no point in reinventing the wheel.
Edit:
Here is code that shows how I do file IO in my program. It isn't from the mentioned program because it wouldn't be that minimal. This one ilustrates the problem nevertheless. Do not mind Windows.h
. It is used only to set thread affinity. In the actual program I also set affinity , so that's why I included it.
#include <fstream>
#include <thread>
#include <memory>
#include <string>
#include <Windows.h> // for SetThreadAffinityMask()
void stress_write(unsigned bytes, int num)
{
std::ofstream out("temp" + std::to_string(num));
for (unsigned i = 0; i < bytes; ++i)
{
out << char(i);
}
}
void lock_thread(unsigned core_idx)
{
SetThreadAffinityMask(GetCurrentThread(), 1LL << core_idx);
}
int main()
{
std::ios_base::sync_with_stdio(false);
lock_thread(0);
auto worker_count = std::thread::hardware_concurrency() - 1;
std::unique_ptr<std::thread[]> threads = std::make_unique<std::thread[]>(worker_count); // faster than std::vector
for (int i = 0; i < worker_count; ++i)
{
threads[i] = std::thread(
[](unsigned idx) {
lock_thread(idx);
stress_write(1'000'000'000, idx);
},
i + 1
);
}
stress_write(1'000'000'000, 0);
for (int i = 0; i < worker_count; ++i)
{
threads[i].join();
}
}
As you can see its just plain old fstream
. On my machine this uses 100% CPU, but only 7-9% disk (around 190MB/s). I am wondering if it could be increased.
fstream
s. I don’t think there is a point in showing it but I will add it to the answers (in a few minutes because it has to be "minimal"). – Tuyettvfstream
really and truly sucks performance-wise. Before jumping to async I/O (which is a good idea) you should first measure withstdio.h
and your OS's platform-specific synchronous I/O functions. (Not saying you need to deploy platform-specific code, but writing some during performance testing can be extraordinarily valuable to determine what layer is adding inefficiencies) – Saylesout << char(i);
may be part of the problem. – Pitarysout << char(i);
being the problem? Its the only IO line in this program, so It will always be the bottleneck, wont it? – Tuyettv