WinAPI - Why does ICMPSendEcho2Ex report false timeouts when Timeout is set below 1000ms?

Edit: I started asking this as a PowerShell / .Net question and couldn't find any reference to it on the internet. With feedback, it appears to be a WINAPI issue so this is an edit/rewrite/retag, but much of the tests and background reference .Net for that reason.

Summary

WINAPI ping function IcmpSendEcho2 appears to have a timing bug if the ping timeout parameter is set below 1000ms (1 second). This causes it to return intermittent false timeout errors. Rather than a proportional "lower timeout = more fails" behaviour, it appears to be a cutoff of >=1000ms+ is expected behaviour, <=999ms triggers false timeouts, often in an alternating success/fail/success/fail pattern.

I call them false timeouts because I have WireShark packet capture showing a reply packet coming back well within the timeout, and partly because the 1ms change shouldn't be a significant amount of time when the replies normally have 500-800ms of headroom, and partly because I can run two concurrent sets of pings with different timeouts and see different behavior between the two.

In the comments of my original .Net question, @wOxxOm has:

located the Open Source .Net code where System.Net.NetworkInformation.Ping() wraps the WinAPI and there appears to be no specific handling of timeouts there, it's passed down to the lower layer directly - possibly line 675 with a call to UnsafeNetInfoNativeMethods.IcmpSendEcho2()

and @Lieven Keersmaekers has investigated and found things beyond my skill level to interpret:

"I can second this being an underlying WINAPI problem. Following a success and timedout call into IPHLPAPI!IcmpSendEcho2Ex: the 000003e7 parameter is on the stack, both set up an event and both return into IPHLPAPI!IcmpEchoRequestComplete with the difference of the success call's eax register containing 00000000 and the timedout call's eax register containing 00000102 (WAIT_TIMEOUT)
"Compiling a 64bit C# version, there's no more calls into IPHLPAPI. A consistent thing that shows up is clr.dll GetLastError() returning WSA_QOS_ADMISSION_FAILURE for timeouts. Also consistent in my sample is the order of thread executions between a success and a timeout call being slightly different."
- This StackOverflow question hints that the WSA_QOS_ADMISSION_FAILURE might be a misleading error, and is actually IP_REQ_TIMED_OUT.

Testing steps:

Pick a distant host and set some continuous pings running. (The IP in my examples belongs to Baidu.cn (China) and has ping replies around ~310ms to me). Expected behaviour for all of them: almost entirely ping replies, with occasional drops due to network conditions.

PowerShell / .Net, with 999ms timeout, actual result is bizarre reply/drop/reply/drop patterns, far more drops than expected:

$Pinger = New-Object -TypeName System.Net.NetworkInformation.Ping
while($true) { 
    $Pinger.Send('111.13.101.208', 999)
    start-sleep -Seconds 1
}

command prompt ping.exe with 999ms timeout, actual result is more reliable (edit: but later findings call this into question as well):

ping 111.13.101.208 -t -w 999

PowerShell / .Net, with 1000ms timeout, actual result is as expected:

$Pinger = New-Object -TypeName System.Net.NetworkInformation.Ping
while($true) { 
    $Pinger.Send('111.13.101.208', 1000)
    start-sleep -Seconds 1
}

It's repeatable with C# as well, but I've edited that code out now it seems to be a WinAPI problem.

Example screenshot of these running side by side

On the left, .Net with 999ms timeout and 50% failure.
Center, command prompt, almost all replies.
On the right, .Net with 1000ms timeout, almost all replies.

Earlier investigations / Background

I started with a 500ms timeout, and the quantity of fake replies seems to vary depending on the ping reply time of the remote host:

pinging something 30ms away reports TimedOut around 1 in 10 pings
pinging something 100ms away reports TimedOut around 1 in 4 pings
pinging something 300ms away reports TimedOut around 1 in 2 pings

From the same computer, on the same internet connection, sending the same amount of data (32 byte buffer) with the same 500ms timeout setting, with no other heavy bandwidth use. I run no antivirus networking filter outside Windows 10 defaults, two other people I know have confirmed this frequent TimedOut behaviour on their computers (now two more have in the comments), with more or fewer timeouts, so it ought not to be my network card or drivers or ISP.

WireShark packet capture of ping reply which is falsely reported as a timeout

I ran the ping by hand four times to a ~100ms away host, with a 500ms timeout, with WireShark capturing network traffic. PowerShell screenshot:

WireShark screenshot:

Note that the WireShark log records 4 requests leaving, 4 replies coming back, each with a time difference of around 0.11s (110 ms) - all well inside the timeout, but PowerShell wrongly reports the last one as a timeout.

Summary

Testing steps:

Example screenshot of these running side by side

Earlier investigations / Background

WireShark packet capture of ping reply which is falsely reported as a timeout

Related questions

Recommended topics

Hot tags