WinAPI - Why does ICMPSendEcho2Ex report false timeouts when Timeout is set below 1000ms?
Asked Answered
A

0

6

Edit: I started asking this as a PowerShell / .Net question and couldn't find any reference to it on the internet. With feedback, it appears to be a WINAPI issue so this is an edit/rewrite/retag, but much of the tests and background reference .Net for that reason.

Summary

WINAPI ping function IcmpSendEcho2 appears to have a timing bug if the ping timeout parameter is set below 1000ms (1 second). This causes it to return intermittent false timeout errors. Rather than a proportional "lower timeout = more fails" behaviour, it appears to be a cutoff of >=1000ms+ is expected behaviour, <=999ms triggers false timeouts, often in an alternating success/fail/success/fail pattern.

I call them false timeouts because I have WireShark packet capture showing a reply packet coming back well within the timeout, and partly because the 1ms change shouldn't be a significant amount of time when the replies normally have 500-800ms of headroom, and partly because I can run two concurrent sets of pings with different timeouts and see different behavior between the two.

In the comments of my original .Net question, @wOxxOm has:

and @Lieven Keersmaekers has investigated and found things beyond my skill level to interpret:

  • "I can second this being an underlying WINAPI problem. Following a success and timedout call into IPHLPAPI!IcmpSendEcho2Ex: the 000003e7 parameter is on the stack, both set up an event and both return into IPHLPAPI!IcmpEchoRequestComplete with the difference of the success call's eax register containing 00000000 and the timedout call's eax register containing 00000102 (WAIT_TIMEOUT)

  • "Compiling a 64bit C# version, there's no more calls into IPHLPAPI. A consistent thing that shows up is clr.dll GetLastError() returning WSA_QOS_ADMISSION_FAILURE for timeouts. Also consistent in my sample is the order of thread executions between a success and a timeout call being slightly different."

Testing steps:

Pick a distant host and set some continuous pings running. (The IP in my examples belongs to Baidu.cn (China) and has ping replies around ~310ms to me). Expected behaviour for all of them: almost entirely ping replies, with occasional drops due to network conditions.

PowerShell / .Net, with 999ms timeout, actual result is bizarre reply/drop/reply/drop patterns, far more drops than expected:

$Pinger = New-Object -TypeName System.Net.NetworkInformation.Ping
while($true) { 
    $Pinger.Send('111.13.101.208', 999)
    start-sleep -Seconds 1
}

command prompt ping.exe with 999ms timeout, actual result is more reliable (edit: but later findings call this into question as well):

ping 111.13.101.208 -t -w 999

PowerShell / .Net, with 1000ms timeout, actual result is as expected:

$Pinger = New-Object -TypeName System.Net.NetworkInformation.Ping
while($true) { 
    $Pinger.Send('111.13.101.208', 1000)
    start-sleep -Seconds 1
}

It's repeatable with C# as well, but I've edited that code out now it seems to be a WinAPI problem.

Example screenshot of these running side by side

  • On the left, .Net with 999ms timeout and 50% failure.
  • Center, command prompt, almost all replies.
  • On the right, .Net with 1000ms timeout, almost all replies.

Comparing command prompt ping and two powershell pings


Earlier investigations / Background

I started with a 500ms timeout, and the quantity of fake replies seems to vary depending on the ping reply time of the remote host:

  • pinging something 30ms away reports TimedOut around 1 in 10 pings
  • pinging something 100ms away reports TimedOut around 1 in 4 pings
  • pinging something 300ms away reports TimedOut around 1 in 2 pings

From the same computer, on the same internet connection, sending the same amount of data (32 byte buffer) with the same 500ms timeout setting, with no other heavy bandwidth use. I run no antivirus networking filter outside Windows 10 defaults, two other people I know have confirmed this frequent TimedOut behaviour on their computers (now two more have in the comments), with more or fewer timeouts, so it ought not to be my network card or drivers or ISP.

WireShark packet capture of ping reply which is falsely reported as a timeout

I ran the ping by hand four times to a ~100ms away host, with a 500ms timeout, with WireShark capturing network traffic. PowerShell screenshot:

Pings from PowerShell - 4 sent, 3 replies

WireShark screenshot:

WireShark log of the pings showing 4 sent, 4 replies

Note that the WireShark log records 4 requests leaving, 4 replies coming back, each with a time difference of around 0.11s (110 ms) - all well inside the timeout, but PowerShell wrongly reports the last one as a timeout.

Related questions

Googling shows me heaps of issues with System.Net.NetworkInformation.Ping but none which look the same, e.g.

The documentation for Ping() warns about the low-timeout-might-still-say-success but I can't see it warns that timeout might falsely report failure if set below 1 second.

Edit: Now that I'm searching for ICMPSendEcho2, I have found exactly this problem documented before in a blog post in May 2015 here: https://www.frameflow.com/ping-utility-flaw-in-windows-api-creating-false-timeouts/ - finding the same behavior, but having no further explanation. They say that ping.exe is affected, when I originally thought it wasn't. They also say:

  • "In our tests we could not reproduce it on Windows Server 2003 R2 nor on Windows Server 2008 R2. However it was seen consistently in Windows Server 2012 R2 and in the latest builds of Windows 10."

Why? What's wrong with the timeout handling that makes it ignore ping repsonses coming into the network stack completely? Why is 1000ms a significant cutoff?

Atticism answered 6/8, 2017 at 3:42 Comment(17)
Ping is an application that uses ICMP echo. ICMP is s low-priority protocol. A host will perform other, higher-priority things before it gets around to responding to ICMP. The performance of ping (ICMP) has no real relation to how other protocols (e.g. TCP or UDP) perform.Ellene
@RonMaupin true, but unrelated to my question, isn't it? Note that these are not delays - otherwise both ping.exe and timeout 1000 would show similar drops and high latency returns - there would be big variations. The occasional timeout shown in cmd is what I expect, 50% fails - notably in the pattern "every other packet" - isn't. These are also apparently not dropped packets because Wireshark shows the ping reply packet arriving, these are - as much as I can tell - false reports somewhere in the Windows / .Net software stack.Atticism
It not just the target of the ICMP echo request, but your host receiving the reply can decide that the ICMP reply is low priority, and it ends up being timed out. Ping is pretty problematic.Ellene
@RonMaupin then why does +499ms going from 500 to 999 in the timeout make no appreciable difference, but +1ms to 1000 makes a significant step-change difference? Why does running something big like Visual Studio make no difference, yet this behaviour appears to be completely repeatable? I simply don't believe from the testing I've done that every other packet is being marked 'low priority' so consistently because my computer is too busy, especially while it handles other ping replies without problem at the same time in other threads.Atticism
Its not that the packets are marked in any priority, its how a system processes things differently. You can believe what you want, or you can believe the evidence. It matters not to me.Ellene
Definitely sounds like a bug. Traversing ping invocation chain in the source code doesn't reveal anything suspicious so the bug is probably in the underlying WINAPI Icmp6SendEcho2.Reliquary
@RonMaupin Just to be clear about this, you think everything in my question is working as expected, you see no problems at all, and nothing surprising?Atticism
That's not at all what I wrote or intended. I simply pointed out how ICMP works.Ellene
Not a solution but at least I can second this being an underlying WINAPI problem. Following a success and timedout call into IPHLPAPI!IcmpSendEcho2Ex: the 000003e7 parameter is on the stack, both set up an event and both return into IPHLPAPI!IcmpEchoRequestComplete with the difference of the success call's eax register containing 00000000 and the timedout call's eax register containing 00000102 (WAIT_TIMEOUT)Harcourt
@wOxxOm good idea with the open source.Atticism
@LievenKeersmaekers - I'm not sure what to make of that, I have little knowledge down at that level, but I'm guessing the consequences are: Microsoft has no motivation to prioritize fixing it, has decided not to for backwards compatibility (and it will be broken forever), or it's mis-documented and not intended to work as I expect. I wonder how many ping-tool authors have stumbled over it, and had to give up if the only option seems to be a paid support call to report it?Atticism
@Atticism - other than concurring with wOxxOm's observation, there's little to make of it. If I find time, I'd like to look at ping and see what API's its using. This looks like a good case to ask Raymond Chen to get involvedHarcourt
Compiling a 64bit C# version, there's no more calls into IPHLPAPI. A consistent thing that shows up is clr.dll GetLastError() returning WSA_QOS_ADMISSION_FAILURE for timeouts. Also consistent in my sample is the order of thread executions between a success and a timeout call being slightly different. All nice and well but it brings us nothing closer to a solution or a reason why.Harcourt
@LievenKeersmaekers interesting. I googled that error and got to this question - is there any chance that answer applies to your findings? ("It turns out that the error code 11010 is actually not WSA_QOS_ADMISSION_FAILURE from WinSock (which is not involved here), but a completely different value from the IP stack's ICMP_ECHO_REPLY structure - IP_REQ_TIMED_OUT") ?Atticism
I've run into that question too (not sure anymore why I discarded it... to tired most likely) and it makes sense that the actual error is IP_REQ_TIMED_OUT but it doesn't offer a solution other than "I'm afraid that only some hardcore kernel debugging could reveal the real cause.". I now how to start a usermode debugging session and sometimes find some nuggets but kernelmode debugging (other than livekd) is beyond me. Ideally, some real WinDbg experts chime in (Thomas, Blabb, Hans, Raymond, ...) and get interested enough trying to find a cause (at least it's reproducable!).Harcourt
As this seems to be more kernel related then .net related, perhaps adding a Windbg and Kernel tag attrackts the right people to this question.Harcourt
@LievenKeersmaekers Now that I'm searching for the WinAPI function instead of the .Net one, it has been noticed before - frameflow.com/… . I've rewritten my question quite a lot, included your findings and wOxxOm's, and retagged with your suggested tags.Atticism

© 2022 - 2024 — McMap. All rights reserved.