Ping Tasks will not complete
Asked Answered
N

1

22

I am working on a "heartbeat" application that pings hundreds of IP addresses every minute via a loop. The IP addresses are stored in a list of a class Machines. I have a loop that creates a Task<MachinePingResults> (where MachinePingResults is basically a Tuple of an IP and online status) for each IP and calls a ping function using System.Net.NetworkInformation.

The issue I'm having is that after hours (or days) of running, one of the loops of the main program fails to finish the Tasks which is leading to a memory leak. I cannot determine why my Tasks are not finishing (if I look in the Task list during runtime after a few days of running, there are hundreds of tasks that appear as "awaiting"). Most of the time all the tasks finish and are disposed; it is just randomly that they don't finish. For example, the past 24 hours had one issue at about 12 hours in with 148 awaiting tasks that never finished. Due to the nature of not being able to see why the Ping is hanging (since it's internal to .NET), I haven't been able to replicate the issue to debug.

(It appears that the Ping call in .NET can hang and the built-in timeout fail if there is a DNS issue, which is why I built an additional timeout in)

I have a way to cancel the main loop if the pings don't return within 15 seconds using Task.Delay and a CancellationToken. Then in each Ping function I have a Delay in case the Ping call itself hangs that forces the function to complete. Also note I am only pinging IPv4; there is no IPv6 or URL.

Main Loop

pingcancel = new CancellationTokenSource();

List<Task<MachinePingResults>> results = new List<Task<MachinePingResults>>();

try
{
    foreach (var m in localMachines.FindAll(m => !m.Online))
        results.Add(Task.Run(() =>
            PingMachine(m.ipAddress, 8000), pingcancel.Token
        ));

    await Task.WhenAny(Task.WhenAll(results.ToArray()), Task.Delay(15000));

    pingcancel.Cancel();
}
catch (Exception ex) { Console.WriteLine(ex); }
finally
{
    results.Where(r => r.IsCompleted).ToList()
        .ForEach(r =>
        //modify the online machines);
        results.Where(r => r.IsCompleted).ToList().ForEach(r => r.Dispose());
        results.Clear();
 }

The Ping Function

static async Task<MachinePingResults> PingMachine(string ipaddress, int timeout)
{
    try
    {
        using (Ping ping = new Ping())
        {
            var reply = ping.SendPingAsync(ipaddress, timeout);

            await Task.WhenAny(Task.Delay(timeout), reply);

            if (reply.IsCompleted && reply.Result.Status == IPStatus.Success)
            {
                return new MachinePingResults(ipaddress, true);
            }
        }
    }
    catch (Exception ex)
    {
        Debug.WriteLine("Error: " + ex.Message);
    }
    return new MachinePingResults(ipaddress, false);
}

With every Task having a Delay to let it continue if the Ping hangs, I don't know what would be the issue that is causing some of the Task<MachinePingResults> to never finish.

How can I ensure a Task using .NET Ping ends?

Using .NET 5.0 and the issues occurs on machines running windows 10 and windows server 2012

image of list of tasks awaiting in VS Tasks window

image of list of tasks awaiting in VS Tasks window

Natie answered 24/11, 2021 at 13:35 Comment(16)
I'm not familiar with Ping but it might hang in the synchronous part, try putting the call to SendPingAsync in a Task.Run. (var reply = Task.Run(async () => await ping.SendPingAsync(ipaddress, timeout));). This won't fix the memory leak but will at least allow PingMachine to always return.Guanine
"if I look in the Task list during runtime after a few days of running, there are hundreds of tasks that appear as awaiting": What is the Task list? Are you running the program for days with the debugger attached?Sedentary
@TheodorZoulias yes, I am running with the debugger attached. The Task List I'm referring to is the one in Visual Studio. When I was running without the debugger I was having a memory leak, so I left the debugger attached for a few days. There are multiple Tasks that are never executed or disposed ofNatie
Have you tried to associate the hanging tasks with the IP addresses? Are there any specific IP addresses that cause the tasks to hang, or it's completely random?Sedentary
@TheodorZoulias it is random; sometimes it is all 1391 that I'm pinging, sometimes it's a few hundred. Since it will hang on all of them, I'm assuming it's not related to the IP itself. I'm thinking if the internal .NET Ping throws an exception, it hangs the function and task.Natie
You have a tough problem to solve. What I would try in your case is to limit the amount of concurrent SendPingAsync operations, by using the Parallel.ForEachAsync or something. I don't expect this to solve the problem though.Sedentary
Task.WhenAny is already an attempt to fix it? Given that ping has its own timeout, much of the PingMachine method seems redundant..Krell
@Krell the ping method in .NET can hang indefinitely. See the linked SO question.Natie
This question is being discussed on metaLeitman
IMO it appears the possible hang is DNS, because Ping.SendAsync use the synchronous Dns.GetAddresses even though it should use the async. So it might be worth splitting those two up yourself and calling Dns.GetHostAddressesAsync manually, and passing Ping just the IP addresses. You can see this bug in referencesource for Framework 4 and the fix in Runtime 5.0+Urias
@charlieface the IPs I’m passing are all numeric (xxx.xxx.xxx.xxx) . There’s no URLsNatie
it goes over DNS if you use string. use IPAddress class as parameter for SendPingAsyncRunesmith
@Runesmith are you sure? The source code suggest otherwise. I see an if (!IPAddress.TryParse(hostNameOrAddress... there.Sedentary
While not an answer I have hit something similar where I at least came up with a way to narrow the problem: The first thing a task does is put itself on a list, the last thing it does is take itself back off. There's also a status field that gets updated with what's going on. Examining the list shows where the task is hanging, but not why. (Obviously, lots of care must be taken with locks.)Wallie
If all attempts to find the root cause of the problem fail, you could admit defeat and offload the heartbeat detection to a separate process, that could be recycled every one hour or so. This would require to introduce inter-process communication machinery in your main app, which is always an undesirable thing to do.Sedentary
@LorenPechtel the tasks hang at either awaiting or unwrapping, on the call to the System Ping. Since it seems to be a .NET problem, that's why I am just trying to create a way to kill those tasks cleanly.Natie
S
7

There are quite a few gaps in the code posted, but I attempted to replicate and in doing so ended up refactoring a bit.

This version seems pretty robust, with the actual call to SendAsync wrapped in an adapter class.

I accept this doesn't necessarily answer the question directly, but in the absence of being able to replicate your problem exactly, offers an alternative way of structuring the code that may eliminate the problem.

    async Task Main()
    {
        var masterCts = new CancellationTokenSource(TimeSpan.FromSeconds(15)); // 15s overall timeout
        
        var localMachines = new List<LocalMachine>
        {       
            new LocalMachine("192.0.0.1", false), // Should be not known - TimedOut
            new LocalMachine("192.168.86.88", false), // Should be not known - DestinationHostUnreachable (when timeout is 8000)
            new LocalMachine("www.dfdfsdfdfdsgrdf.cdcc", false), // Should be not known - status Unknown because of PingException
            new LocalMachine("192.168.86.87", false) // Known - my local IP
        };

        var results = new List<PingerResult>();

        try
        {
            // Create the "hot" tasks
            var tasks = localMachines.Where(m => !m.Online)
                                    .Select(m => new Pinger().SendPingAsync(m.HostOrAddress, 8000, masterCts.Token))
                                    .ToArray();

            await Task.WhenAll(tasks);
            
            results.AddRange(tasks.Select(t => t.Result));
        }
        finally
        {
            results.ForEach(r => localMachines.Single(m => m.HostOrAddress.Equals(r.HostOrAddress)).Online = r.Status == IPStatus.Success);

            results.Dump();  // For LINQPad
            localMachines.Dump(); // For LINQPad

            results.Clear();
        }
    }

    public class LocalMachine
    {
        public LocalMachine(string hostOrAddress, bool online)
        {
            HostOrAddress = hostOrAddress;
            Online = online;
        }

        public string HostOrAddress { get; }

        public bool Online { get; set; }
    }

    public class PingerResult
    {
        public string HostOrAddress {get;set;}
        
        public IPStatus Status {get;set;}
    }

    public class Pinger 
    {
        public async Task<PingerResult> SendPingAsync(string hostOrAddress, int timeout, CancellationToken token)
        {
            // Check if timeout has occurred
            token.ThrowIfCancellationRequested();

            IPStatus status = default;

            try
            {
                var reply = await SendPingInternal(hostOrAddress, timeout, token);
                status = reply.Status;
            }
            catch (PingException)
            {               
                status = IPStatus.Unknown;
            }
            
            return new PingerResult
            {
                HostOrAddress = hostOrAddress,
                Status = status
            };
        }

        // Wrap the legacy EAP pattern offered by Ping.
        private Task<PingReply> SendPingInternal(string hostOrAddress, int timeout, CancellationToken cancelToken)
        {
            var tcs = new TaskCompletionSource<PingReply>();

            if (cancelToken.IsCancellationRequested)
            {
                tcs.TrySetCanceled();
            }   
            else
            {
                using (var ping = new Ping())
                {
                    ping.PingCompleted += (object sender, PingCompletedEventArgs e) =>
                    {
                        if (!cancelToken.IsCancellationRequested)
                        {
                            if (e.Cancelled)
                            {
                                tcs.TrySetCanceled();
                            }
                            else if (e.Error != null)
                            {
                                tcs.TrySetException(e.Error);
                            }
                            else
                            {
                                tcs.TrySetResult(e.Reply);
                            }
                        }
                    };
                    
                    cancelToken.Register(() => { tcs.TrySetCanceled(); });

                    ping.SendAsync(hostOrAddress, timeout, new object());
                }
            };

            return tcs.Task;
        }
    }

EDIT:

I've just noticed in the comments that you mention pinging "all 1391". At this point I would look to throttle the number of pings that are sent concurrently using a SemaphoreSlim. See this blog post (from a long time ago!) that outlines the approach: https://devblogs.microsoft.com/pfxteam/implementing-a-simple-foreachasync/

Scalf answered 25/11, 2021 at 16:23 Comment(8)
Thank you for the effort you put in. I will try this out, but I wanted to mention, since it takes a little bit to have the issue, it will be a few days before I know if this is the answer.Natie
@Natie - Your question makes no mention of that fact; it shouldHosbein
@security hound the second paragraph…Natie
@Natie no problem. I added an edit about using a semaphore to throttle if you need to ping over a thousand machines.Scalf
OK. Difficult to say without seeing your code. TranslateTasktoEAP doesn't exist in my code.Scalf
The TranslateTasktoEAP is internal to Ping. I implemented a semaphore (max 1000) like you suggested in addition to what you posted, and it seems to be working. No tasks have hung (yet...~10 hours).Natie
Looks like adding the semaphore did itNatie
Lovely stuff 👌🏼Scalf

© 2022 - 2024 — McMap. All rights reserved.