GetQueuedCompletionStatus can't dequeue IO from IOCP if the thread which originally issued the IO is blocking in ReadFile under windows 8
Asked Answered
B

1

12

My app stop working after switching to windows 8. I spend hours to debug the problem, found out IOCP behave differently between windows 8 and previous versions. I extract the necessary code to demonstrate and reproduce the problem.

SOCKET sListen;

DWORD WINAPI WorkerProc(LPVOID lpParam)
{
    ULONG_PTR dwKey;
    DWORD dwTrans;
    LPOVERLAPPED lpol;
    while(true)
    {
        GetQueuedCompletionStatus((HANDLE)lpParam, &dwTrans, &dwKey, (LPOVERLAPPED*)&lpol, WSA_INFINITE);
        printf("dequeued an IO\n");
    }
}
DWORD WINAPI StartProc(LPVOID lpParam)
{
    WSADATA WsaData;
    if (WSAStartup(0x202,&WsaData)!=0) return 1;
    sListen = WSASocket(AF_INET, SOCK_STREAM, 0, NULL, 0, WSA_FLAG_OVERLAPPED);
    SOCKADDR_IN si;
    ZeroMemory(&si,sizeof(si));
    si.sin_family = AF_INET;
    si.sin_port = ntohs(1999);
    si.sin_addr.S_un.S_addr = INADDR_ANY;
    if(bind(sListen, (sockaddr*)&si, sizeof(si)) == SOCKET_ERROR) return 1;
    listen(sListen, SOMAXCONN);
    HANDLE hCompletion = CreateIoCompletionPort(INVALID_HANDLE_VALUE, 0, 0, 0);
    CreateIoCompletionPort((HANDLE)sListen, hCompletion, (DWORD)0, 0);
    CreateThread(NULL, 0, WorkerProc, hCompletion, 0, NULL);
    return 0;
}
DWORD WINAPI AcceptProc(LPVOID lpParam)
{
    DWORD dwBytes;
    LPOVERLAPPED pol=(LPOVERLAPPED)malloc(sizeof(OVERLAPPED));
    ZeroMemory(pol,sizeof(OVERLAPPED));
    SOCKET sClient = WSASocket(AF_INET, SOCK_STREAM, 0, NULL, 0, WSA_FLAG_OVERLAPPED);
    BOOL b = AcceptEx(sListen, 
        sClient,
        malloc ((sizeof(sockaddr_in) + 16) * 2), 
        0,
        sizeof(sockaddr_in) + 16, 
        sizeof(sockaddr_in) + 16, 
        &dwBytes, 
        pol);
    if(!b && WSAGetLastError() != WSA_IO_PENDING)   return 1;
    HANDLE hPipe=CreateNamedPipeA("\\\\.\\pipe\\testpipe",PIPE_ACCESS_DUPLEX,PIPE_TYPE_BYTE | PIPE_READMODE_BYTE | PIPE_WAIT,PIPE_UNLIMITED_INSTANCES,4096,4096,999999999,NULL);
    BYTE chBuf[1024]; 
    DWORD  cbRead; 
    CreateFileA("\\\\.\\pipe\\testpipe", GENERIC_READ |GENERIC_WRITE,  0,NULL, OPEN_EXISTING, 0, NULL);
    ReadFile(hPipe,chBuf,1024, &cbRead,NULL);
    return 0;
}

int main()
{
    printf ("Starting server on port 1999...");
    WaitForSingleObject(CreateThread(NULL, 0, StartProc, NULL, 0, NULL),INFINITE);
    CreateThread(NULL, 0,AcceptProc, NULL, 0, NULL);
    printf ("done\n");
    Sleep(10000000);
    return 0;
}

This program listen on port 1999 and issue an async accpet then reading a blocking pipe. I have tested this program on Windows 7, 8, XP, 2003, 2008, after "telnet 127.0.0.1 1999", "dequeued an IO\n" will printed on console except windows 8.

The point is the thread which originally issued the async operation must not blocking in ReadFile or GetQueuedCompletionStatus will never dequeue that IO until ReadFile returns on windows 8.

I also tested using "scanf" instead of reading pipe, the results are same since "scanf" will call ReadFile to read console eventually. I don't know if ReadFile is the only function affected or there may be other functions.

What I can think of is using a dedicated thread to issue async operations, and all business logic communicate with that dedicated thread to perform accept/send/recv. But extra layer means extra overhead, is there any way to achieve the same performance as previous versions of windows on windows 8?

Bierman answered 21/8, 2012 at 3:6 Comment(9)
Have you tried this on Server 2012?Anchises
I can confirm that the above test program also fails on Server 2012 RC.Anchises
Interestingly... It doesn't block in the same way if you use WSAAcept() to accept in a blocking manner and then issue an overlapped read before doing your blocking read on the pipe. The overlapped read operates as expected. So it looks like it's JUST AcceptEx that is behaving like this...Anchises
And adjusting the main code so that it waits 5 seconds and then terminates the accept thread causes the acceptEx completion packet to be handled (assuming that you've connected). So it does appear to be due to the fact that the thread is blocked in ReadFile...Anchises
I've cross posted this to here: social.technet.microsoft.com/Forums/en-US/winserver8gen/thread/… as I think this is rather important!Anchises
For what it's worth, I looked at this cut-down too. I can't see any reason why the AcceptEx() completion should not be processed either:( What have they broken now? <g>Lailaibach
Martin, quite, looks like a bug and one that has made its way into Windows 8 RTM and, at this rate, will also make its way into Server 2012 RTM. I've tried posting this on other MS forums but so far nobody seems interested (or knowledgeable enough to comment).Anchises
..or we're missing something..Lailaibach
I just had a notification on MS Connect which confirms it's a bug and that they "will fix it at some future point". So, I guess, any code that used AcceptEx() and runs on Windows 8 or any Windows Server 2012 variant is now potentially broken.Anchises
A
7

See https://connect.microsoft.com/WindowsServer/feedback/details/760161/breaking-change-to-acceptex-and-iocp-in-server-2012-and-windows-8

This is a bug and the official MS response is "We've passed this to the base OS team and they will consider this for a future update. I'm resolving this postponed."

Note: I ran this test on a fully patched version of Windows 8 today (12th Sep 2013) in preparation to testing Windows 8.1 and found that the problem appears to be fixed on Windows 8 now. I've no idea WHEN it was fixed.

Anchises answered 3/9, 2012 at 15:59 Comment(5)
At least you got some sort of response out of them :)Unbroken
Yeah, not as useful as I was hoping for. Some indication of why the problem is happening and which APIs might be causing problems would be nice.Anchises
MS Connect is hopeless but unfortunately it's the only way to report these sort of things to Microsoft. The last time I reported a bug in WCF for example, it fell into a pit of red tape and it still hasn't been fixed. I'd guess this AcceptEx bug is slightly more important though seeing as it affects the Core OS. I still think you'll be waiting about 6 months though.Unbroken
I doubt it will affect IIS or SQL Server (or it would have been found before RTM), so I agree, I expect it will take an age to fix. It's unlikely to affect my clients, and knowing about it (and the possibility of related issues) should help people spot it and work around it if they can.Anchises
I certainly hope it has more traction than other reports to MS. Ex. their regex impl in VS2010 that is broken for sequence receptions (it actually uses N-1 rather than N for the occurrence count) was reported immediately; their response: we fixed it in VS2012, so switch to that when we release it (it was still in Beta at the time). Pffft. The thundering feet of the stampede back to boost could be heard for miles.Nephograph

© 2022 - 2024 — McMap. All rights reserved.