Detecting/Diagnosing Thread Starvation
Asked Answered
R

2

21

I am doing some performance/scalability testing of an IIS application that occasionally seems to slow down to a crawl in production. I'm able to reproduce the slowness consistently using NUnit.

CPU and Memory do not spike during the testing, or when the slowness occurs in production. My strong suspicion is that the application is suffering from thread starvation, since it does not appear to be CPU, Memory, I/O, or database access that is causing the bottleneck. I do see signs of what appear to be thread starvation; for example, NLog's async log file writes tend to have long periods of silence followed by bursts of activity with older time stamps (i.e. a lower-priority thread is waiting for threads to free up in order to write).

What steps can I take to definitively determine that the application is indeed thread starved, and (assuming that is the case) pinpoint the exact areas of the system that are causing the problem?

Edit

I neglected to mention that almost all the code is synchronous (it's a legacy system).

Reconnoiter answered 11/7, 2017 at 13:57 Comment(5)
Do you start many long-running tasks at once without specifying TaskCreationOptions.LongRunning? If you do, then any new supposed-to-live-short task will experience huge delay on startup, because by default scheduler is waiting for something like 500 ms before allocating a new thread for such. Can be fixed with ThreadPool.SetMinThreads, but long-running option is preferable (as you may run into switching performance degradation).Biographer
Interesting. Would this also apply to synchronous code?Reconnoiter
@Biographer Having tasks that aren't long running is how you ensure you don't have thread starvation. Creating lots of tasks using LongRunning is how you create thread starvation.Corset
@Servy, I wasn't talking thread starvation, only some thoughts about "slow down".Biographer
@Sinatr: your comments got me researching SetMinThreads, which led me to this question and it's answers (one of which was yours). Setting minThreads to a higher number at startup made a tremendous difference in performance.Reconnoiter
R
16

Based on Sinatr's comment, I did some reading on ThreadPool.SetMinThreads and TaskCreationOptions.LongRunning, including answers to When to use TaskCreationOptions.LongRunning?

Setting MinThreads to a higher default value made a huge difference in my case. I created a simple background process to see if Available Threads in the ThreadPool was changing significantly during the course of a test run and exceeding the MinThreads value (it was).

Here's some code I used to diagnose. This is not intended for production use, and the reporting of thread usage shown here would only be interesting as they ramped up initially. Also note that the Timer needs a thread when it elapses, so also needs to wait for an available thread.

Static vars:

    private static Timer _timer;
    private static int _lastActiveThreads;
    private static int _lastAvailableThreads;
    private static int _maxThreads;
    private static int _minThreads;

Run at startup:

    int completionPortThreads;

    ThreadPool.GetMaxThreads(out _maxThreads, out completionPortThreads);
    ThreadPool.GetMinThreads(out _minThreads, out completionPortThreads);

    _timer = new Timer
    {
        AutoReset = true,
        Interval = 500,
    };

    _timer.Elapsed += TimerElasped;
    _timer.Start();

Elapsed method:

    private static void TimerElasped(object sender, ElapsedEventArgs e)
    {
        int minWorkerThreads;
        int availWorkerThreads;
        int completionPortThreads;

        ThreadPool.GetMinThreads(out minWorkerThreads, out completionPortThreads);
        ThreadPool.GetAvailableThreads(out availWorkerThreads, out completionPortThreads);

        var activeThreads = _maxThreads - availWorkerThreads;

        if (availWorkerThreads != _lastAvailableThreads)
        {
            _lastAvailableThreads = availWorkerThreads;
            if (activeThreads > _lastActiveThreads)
            {
                _lastActiveThreads = activeThreads;
                Logger.Log($"+++++ Active Threads is now: {activeThreads}");

                if (activeThreads > _minThreads)
                {
                    var diff = activeThreads - _minThreads;
                    Logger.Log($"+++++ Active threads is now {activeThreads}, which is {diff} more than minThread value of {_minThreads}.  This may be causing delays.");
                }
            }
        }
    }
Reconnoiter answered 12/7, 2017 at 16:21 Comment(0)
A
8

I came up with this based on the above

using System;
using System.Threading;
using System.Timers;
using log4net;

using Timer = System.Timers.Timer;

namespace somewhere
{
    public class ThreadStatsLogger : IDisposable
    {
        private const int DEPLETION_WARN_LEVEL = 10;
        private const int HISTERESIS_LEVEL = 10;

    private const double SAMPLE_RATE_MILLISECONDS = 500;
    private bool _workerThreadWarned = false;
    private bool _ioThreadWarned = false;
    private bool _minWorkerThreadLevelWarned = false;
    private bool _minIoThreadLevelWarned = false;

    private readonly int _maxWorkerThreadLevel;
    private readonly int _maxIoThreadLevel;
    private readonly int _minWorkerThreadLevel;
    private readonly int _minWorkerThreadLevelRecovery;
    private readonly int _minIoThreadLevel;
    private readonly int _minIoThreadLevelRecovery;
    private Timer _timer;

    private static readonly ILog _logger = LogManager.GetLogger(System.Reflection.MethodBase.GetCurrentMethod().DeclaringType);

    public ThreadStatsLogger()
    {

        _timer = new Timer
        {
            AutoReset = true,
            Interval = SAMPLE_RATE_MILLISECONDS,
        };

        _timer.Elapsed += TimerElasped;
        _timer.Start();
        ThreadPool.GetMinThreads(out _minWorkerThreadLevel, out _minIoThreadLevel);
        ThreadPool.GetMaxThreads(out _maxWorkerThreadLevel, out _maxIoThreadLevel);
        ThreadPool.GetAvailableThreads(out int workerAvailable, out int ioAvailable);

        _logger.InfoFormat("Thread statistics at startup: minimum worker:{0} io:{1}", _minWorkerThreadLevel, _minIoThreadLevel );
        _logger.InfoFormat("Thread statistics at startup: maximum worker:{0} io:{1}", _maxWorkerThreadLevel, _maxIoThreadLevel);
        _logger.InfoFormat("Thread statistics at startup: available worker:{0} io:{1}", workerAvailable, ioAvailable);

        _minWorkerThreadLevelRecovery = (_minWorkerThreadLevel * 3) / 4;
        _minIoThreadLevelRecovery = (_minIoThreadLevel * 3) / 4;
        if (_minWorkerThreadLevelRecovery == _minWorkerThreadLevel) _minWorkerThreadLevelRecovery = _minWorkerThreadLevel - 1;
        if (_minIoThreadLevelRecovery == _minIoThreadLevel) _minIoThreadLevelRecovery = _minIoThreadLevel - 1;
    }

    private void TimerElasped(object sender, ElapsedEventArgs e)
    {

        ThreadPool.GetAvailableThreads(out int availableWorkerThreads, out int availableIoThreads);

        var activeWorkerThreads = _maxWorkerThreadLevel - availableWorkerThreads;
        var activeIoThreads = _maxIoThreadLevel - availableIoThreads;

        _logger.InfoFormat("Thread statistics: active worker:{0} io:{1}", activeWorkerThreads, activeIoThreads);

        if (activeWorkerThreads > _minWorkerThreadLevel && !_minWorkerThreadLevelWarned)
        {
            _logger.InfoFormat("Thread statistics WARN active worker threads above minimum {0}:{1}", activeWorkerThreads, _minWorkerThreadLevel);
            _minWorkerThreadLevelWarned = !_minWorkerThreadLevelWarned;
        }
        if (activeWorkerThreads < _minWorkerThreadLevelRecovery && _minWorkerThreadLevelWarned)
        {
            _logger.InfoFormat("Thread statistics RECOVERY active worker threads below minimum {0}:{1}", activeWorkerThreads, _minWorkerThreadLevel);
            _minWorkerThreadLevelWarned = !_minWorkerThreadLevelWarned;
        }

        if (activeIoThreads > _minIoThreadLevel && !_minIoThreadLevelWarned)
        {
            _logger.InfoFormat("Thread statistics WARN active io threads above minimum {0}:{1}", activeIoThreads, _minIoThreadLevel);
            _minIoThreadLevelWarned = !_minIoThreadLevelWarned;
        }
        if (activeIoThreads < _minIoThreadLevelRecovery && _minIoThreadLevelWarned)
        {
            _logger.InfoFormat("Thread statistics RECOVERY active io threads below minimum {0}:{1}", activeIoThreads, _minIoThreadLevel);
            _minIoThreadLevelWarned = !_minIoThreadLevelWarned;
        }

        if (availableWorkerThreads < DEPLETION_WARN_LEVEL && !_workerThreadWarned)
        {
            _logger.InfoFormat("Thread statistics WARN available worker threads below warning level {0}:{1}", availableWorkerThreads, DEPLETION_WARN_LEVEL);
            _workerThreadWarned = !_workerThreadWarned;
        }

        if (availableWorkerThreads > (DEPLETION_WARN_LEVEL + HISTERESIS_LEVEL) && _workerThreadWarned)
        {
            _logger.InfoFormat("Thread statistics RECOVERY available worker thread recovery {0}:{1}", availableWorkerThreads, DEPLETION_WARN_LEVEL);
            _workerThreadWarned = !_workerThreadWarned;
        }

        if (availableIoThreads < DEPLETION_WARN_LEVEL && !_ioThreadWarned)
        {
            _logger.InfoFormat("Thread statistics WARN available io threads below warning level {0}:{1}", availableIoThreads, DEPLETION_WARN_LEVEL);
            _ioThreadWarned = !_ioThreadWarned;
        }

        if (availableIoThreads > (DEPLETION_WARN_LEVEL + HISTERESIS_LEVEL) && _ioThreadWarned)
        {
            _logger.InfoFormat("Thread statistics RECOVERY available io thread recovery {0}:{1}", availableIoThreads, DEPLETION_WARN_LEVEL);
            _ioThreadWarned = !_ioThreadWarned;
        }
    }

    public void Dispose()
    {
        if (_timer == null) return;
        _timer.Close();
        _timer.Dispose();
        _timer = null;
    }
}

}

Aperture answered 24/5, 2018 at 10:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.