Visual C++ uses the Windows thread pool (Vista's CreateThreadpoolWork
if available and QueueUserWorkItem
if not) when calling std::async
with std::launch::async
.
The number of threads in the pool is limited. If we create several tasks that run for a long time without sleeping (including doing I/O), the upcoming tasks in the queue won't get a chance to work.
The standard (I'm using N4140) says that using std::async
with std::launch::async
... calls
INVOKE(DECAY_COPY(std::forward<F>(f)), DECAY_COPY(std::forward<Args>(args))...)
(20.9.2, 30.3.1.2) as if in a new thread of execution represented by a thread object with the calls toDECAY_COPY()
being evaluated in the thread that calledasync
.
(§30.6.8p3, Emphasis mine.)
std::thread
's constructor creates a new thread etc.
About threads in general it says (§1.10p3):
Implementations should ensure that all unblocked threads eventually make progress. [Note: Standard library functions may silently block on I/O or locks. Factors in the execution environment, including externally-imposed thread priorities, may prevent an implementation from making certain guarantees of forward progress. —end note]
If I create a bunch of OS threads or std::thread
s, all performing some very long (perhaps infinite) tasks, they'll all be scheduled (at least on Windows; without messing with priorities, affinities, etc.). If we schedule the same tasks to the Windows thread pool (or use std::async(std::launch::async, ...)
which does that), the later scheduled tasks won't run until the earlier tasks will finish.
Is this compliant with the standard, strictly speaking? And what does "eventually" mean?
The problem is that if the tasks scheduled first are de-facto infinite, the rest of the tasks won't run. So the other threads (not OS threads, but "C++-threads" according to the as-if rule) won't make progress.
One may argue that if the code has infinite loops the behavior is undefined, and thus it's compliant.
But I argue that we don't need an infinite loop of the problematic kind the standard says causes UB to make that happen. Accessing volatile objects, performing atomic operation and synchronization operations are all side effects that "disable" the assumption about loops terminating.
(I have a bunch of async calls executing the following lambda
auto lambda = [&] {
while (m.try_lock() == false) {
for (size_t i = 0; i < (2 << 24); i++) {
vi++;
}
vi = 0;
}
};
and the lock is released only upon user input. But there are other valid kinds of legitimate infinite loops.)
If I schedule a couple of such tasks, tasks I schedule after them don't get to run.
A really wicked example would be launching too many tasks that run until a lock is release/a flag is raised and then schedule using std::async(std::launch::async, ...)
a task that raises the flag. Unless the word "eventually" means something very surprising, this program has to terminate. But under the VC++ implementation it won't!
To me it seems like a violation of the standard. What makes me wonder is the second sentence in the note. Factors may prevent implementations from making certain guarantees of forward progress. So how are these implementation conforming?
It's like saying there may be factors preventing implementations from providing certain aspect of memory ordering, atomicity, or even the existence of multiple threads of execution. Great, but conforming hosted implementations must support multiple threads. Too bad for them and their factors. If they can't provide them that's not C++.
Is this a relaxation of the requirement? If interpreting so, it's a complete withdrawal of the requirement, since it doesn't specify what are the factors and, more importantly, which guarantees may be not supplied by the implementations.
If not - what does that note even mean?
I recall footnotes being non-normative according to the ISO/IEC Directives, but I'm not sure about notes. I did find in the ISO/IEC directives the following:
24 Notes
24.1 Purpose or rationale
Notes are used for giving additional information intended to assist the understanding or use of the text of the document. The document shall be usable without the notes.
Emphasis mine. If I consider the document without that unclear note, seems to me like threads must make progress, std::async(std::launch::async, ...)
has the effect as-if the functor is execute on a new thread, as-if it was being created using std::thread
, and thus a functors dispatched using std::async(std::launch::async, ...)
must make progress. And in the VC++ implementation with the threadpool they don't. So VC++ is in violation of the standard in this respect.
Full example, tested using VS 2015U3 on Windows 10 Enterprise 1607 on i5-6440HQ:
#include <iostream>
#include <future>
#include <atomic>
int main() {
volatile int vi{};
std::mutex m{};
m.lock();
auto lambda = [&] {
while (m.try_lock() == false) {
for (size_t i = 0; i < (2 << 10); i++) {
vi++;
}
vi = 0;
}
m.unlock();
};
std::vector<decltype(std::async(std::launch::async, lambda))> v;
int threadCount{};
std::cin >> threadCount;
for (int i = 0; i < threadCount; i++) {
v.emplace_back(std::move(std::async(std::launch::async, lambda)));
}
auto release = std::async(std::launch::async, [&] {
__asm int 3;
std::cout << "foo" << std::endl;
vi = 123;
m.unlock();
});
return 0;
}
With 4 or less it terminates. With more than 4 it doesn't.
Similar questions:
Is there an implementation of std::async which uses thread pool? - But it doesn't question about compliance, and doesn't have an answer anyway.
std::async - Implementation dependent usage? - Mentions that "thread pools are not really supported" but focuses on
thread_local
variables (which is solvable even if "not straightforward" or non-trivial as the answer and comment say) and doesn't address the note near the requirement of making progress.
async|deferred
) is allowed to use threadpools but theasync
policy alone is not, and that it did/does is a bug. I just tested @EvgenyPanasyuk's code from this answer (with the launch policy altered to beasync
only) with VC++ 2017, and it is still most definitely threadpool-based and thus in violation of the standard. :-[ I don't know if this is enough for an answer. – Inshoreasync|deferred
is allowed to execute on the current thread, so it obviously has no requirements about being executed in a new thread; thus it follows that a threadpool thread should be perfectly acceptable. – Inshorethread_local
variables, as pointed out in one of the answers you cited. – Underveststd::thread
(although implementations are encouraged). That means all bets are basically off? – Jaffe