Is the Visual C++ implementation of std::async using a thread pool standard-compliant?
Asked Answered
O

1

28

Visual C++ uses the Windows thread pool (Vista's CreateThreadpoolWork if available and QueueUserWorkItem if not) when calling std::async with std::launch::async.

The number of threads in the pool is limited. If we create several tasks that run for a long time without sleeping (including doing I/O), the upcoming tasks in the queue won't get a chance to work.

The standard (I'm using N4140) says that using std::async with std::launch::async

... calls INVOKE(DECAY_COPY(std::forward<F>(f)), DECAY_COPY(std::forward<Args>(args))...) (20.9.2, 30.3.1.2) as if in a new thread of execution represented by a thread object with the calls to DECAY_COPY() being evaluated in the thread that called async.

(§30.6.8p3, Emphasis mine.)

std::thread's constructor creates a new thread etc.

About threads in general it says (§1.10p3):

Implementations should ensure that all unblocked threads eventually make progress. [Note: Standard library functions may silently block on I/O or locks. Factors in the execution environment, including externally-imposed thread priorities, may prevent an implementation from making certain guarantees of forward progress. —end note]

If I create a bunch of OS threads or std::threads, all performing some very long (perhaps infinite) tasks, they'll all be scheduled (at least on Windows; without messing with priorities, affinities, etc.). If we schedule the same tasks to the Windows thread pool (or use std::async(std::launch::async, ...) which does that), the later scheduled tasks won't run until the earlier tasks will finish.

Is this compliant with the standard, strictly speaking? And what does "eventually" mean?


The problem is that if the tasks scheduled first are de-facto infinite, the rest of the tasks won't run. So the other threads (not OS threads, but "C++-threads" according to the as-if rule) won't make progress.

One may argue that if the code has infinite loops the behavior is undefined, and thus it's compliant.

But I argue that we don't need an infinite loop of the problematic kind the standard says causes UB to make that happen. Accessing volatile objects, performing atomic operation and synchronization operations are all side effects that "disable" the assumption about loops terminating.

(I have a bunch of async calls executing the following lambda

auto lambda = [&] {
    while (m.try_lock() == false) {
        for (size_t i = 0; i < (2 << 24); i++) {
            vi++;
        }
        vi = 0;
    }
};

and the lock is released only upon user input. But there are other valid kinds of legitimate infinite loops.)

If I schedule a couple of such tasks, tasks I schedule after them don't get to run.

A really wicked example would be launching too many tasks that run until a lock is release/a flag is raised and then schedule using std::async(std::launch::async, ...) a task that raises the flag. Unless the word "eventually" means something very surprising, this program has to terminate. But under the VC++ implementation it won't!

To me it seems like a violation of the standard. What makes me wonder is the second sentence in the note. Factors may prevent implementations from making certain guarantees of forward progress. So how are these implementation conforming?

It's like saying there may be factors preventing implementations from providing certain aspect of memory ordering, atomicity, or even the existence of multiple threads of execution. Great, but conforming hosted implementations must support multiple threads. Too bad for them and their factors. If they can't provide them that's not C++.

Is this a relaxation of the requirement? If interpreting so, it's a complete withdrawal of the requirement, since it doesn't specify what are the factors and, more importantly, which guarantees may be not supplied by the implementations.

If not - what does that note even mean?

I recall footnotes being non-normative according to the ISO/IEC Directives, but I'm not sure about notes. I did find in the ISO/IEC directives the following:

24 Notes

24.1 Purpose or rationale

Notes are used for giving additional information intended to assist the understanding or use of the text of the document. The document shall be usable without the notes.

Emphasis mine. If I consider the document without that unclear note, seems to me like threads must make progress, std::async(std::launch::async, ...) has the effect as-if the functor is execute on a new thread, as-if it was being created using std::thread, and thus a functors dispatched using std::async(std::launch::async, ...) must make progress. And in the VC++ implementation with the threadpool they don't. So VC++ is in violation of the standard in this respect.


Full example, tested using VS 2015U3 on Windows 10 Enterprise 1607 on i5-6440HQ:

#include <iostream>
#include <future>
#include <atomic>

int main() {
    volatile int vi{};
    std::mutex m{};
    m.lock();

    auto lambda = [&] {
        while (m.try_lock() == false) {
            for (size_t i = 0; i < (2 << 10); i++) {
                vi++;
            }
            vi = 0;
        }
        m.unlock();
    };

    std::vector<decltype(std::async(std::launch::async, lambda))> v;

    int threadCount{};
    std::cin >> threadCount;
    for (int i = 0; i < threadCount; i++) {
        v.emplace_back(std::move(std::async(std::launch::async, lambda)));
    }

    auto release = std::async(std::launch::async, [&] {
        __asm int 3;
        std::cout << "foo" << std::endl;
        vi = 123;
        m.unlock();
    });
    
    return 0;
}

With 4 or less it terminates. With more than 4 it doesn't.


Similar questions:

Octa answered 12/3, 2017 at 23:1 Comment(14)
The comments on Async() vs threading on performance and control are also informative.Inshore
According to a comment by STL on this blog post, the default launch policy (async|deferred) is allowed to use threadpools but the async policy alone is not, and that it did/does is a bug. I just tested @EvgenyPanasyuk's code from this answer (with the launch policy altered to be async only) with VC++ 2017, and it is still most definitely threadpool-based and thus in violation of the standard. :-[ I don't know if this is enough for an answer.Inshore
I believe that a reasonable attempt to satisfy the as if and forward progress guarantees means MSVC should be looking for starving async calls and starting new threads for them at the very least...Jaffe
@ildjarn: This is very useful actually. Like the answer I saw, it doesn't address the note about "factors in the execution environment" etc., but STL is reasonably authoritative, so it helps. (On the other hand, I really don't understand the basis for his claim that using the thread pool is correct for `std::launch::async | std::launch::deferred, which make me wonder...)Octa
@Octa : async|deferred is allowed to execute on the current thread, so it obviously has no requirements about being executed in a new thread; thus it follows that a threadpool thread should be perfectly acceptable.Inshore
@ildjarn: This is a bit off-topic so I opened a chat room about this comment by STL: chat.stackoverflow.com/rooms/137908/room-for-conio-and-ildjarnOcta
"Should" is nonbinding. "Implementations should do X" essentially means "implementations are encouraged, but not required, to do X".Undervest
@T.C.: Thank you. Very good point. I looked it up and you're right. If making progress is a recommendation and not a requirement, why isn't it legal to use a thread pool according to STL and others? Even if it doesn't run on a new OS thread, what requirement of running as-is on a new "C++ thread" does it violate?Octa
FPG in parallel algorithms is tangentally related. Is there an equivalent C++17 FPG for threads?Jaffe
@Yakk timsong-cpp.github.io/cppwp/intro.progressUndervest
@Octa thread_local variables, as pointed out in one of the answers you cited.Undervest
So no mandated guarantee of forward progress even from std::thread (although implementations are encouraged). That means all bets are basically off?Jaffe
I think the combination of the comment by STL cited by ildjarn (no, using a thread pool the way Microsoft does is not conforming) and T.C.'s observation that "should" is not normative (so it's not nonconforming for the reason I thought) answer my question and if someone writes it as an answer I will accept. @T.C.'s comment made me realize something else, though - if there is more than one non-blocked thread executing concurrently, even if they only perform lock-free actions, it is not required that they make progress. (§1.10p4 in N4140 - "at least one should complete", not "shall".)Octa
This looks like a question which is very hard to answer. This is cutting-edge C++; we're trying to standardize incredibly complex behavior across implementations. P0072R1 suggests "An unbounded thread pool that eventually adds a new OS thread to the pool if some EAs did not run yet is also a valid implementation." (emphasis mine). So thread pools themselves do not seem to be the problem.Coddle
M
5

The situation has been clarified somewhat in C++17 by P0296R2. Unless the Visual C++ implementation documents that its threads do not provide concurrent forward progress guarantees (which would be generally undesirable), the bounded thread pool is not conforming (in C++17).

The note about "externally imposed thread priorities" has been removed, perhaps because it is already always possible for the environment to prevent the progress of a C++ program (if not by priority, then by being suspended, and if not that, then by power or hardware failure).

There is one remaining normative "should" in that section, but it pertains (as conio mentioned) only to lock-free operations, which can be delayed indefinitely by frequent concurrent access by other thread to the same cache line (not merely the same atomic variable). (I think that in some implementations this can happen even if the other threads are only reading.)

Mendes answered 18/12, 2017 at 1:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.