Timer Efficiency

Asked 15/10, 2010 at 18:32 Answered 15/10, 2010 at 23:20

Solved .net timer performance theory concept

I am planning to develop a system with tens of thousands of objects in it, which will each have up to 42(but more likely down around 4 or 5) separate actions they will potentially be performing at regular intervals. I also plan to write code that will deactivate the timers until the object comes into use. When idle, the objects will only need 1 timer each, but when active, the other timers will all start at once. At first the number of objects will be small, maybe a few hundred, but I expect it to grow exponentially, and within a few months, start to reach up in the tens of thousands.

So, I am very worried about efficiency of the code I will be writing for the timers and for these objects. There are three levels in which I could write this application on that would all successfully perform the tasks required. Also, I plan to run this system on a Quad Core server, so I would like to make use of multi-threading wherever possible.

To this end, I've decided to use the System.Timers.Timer class which fires a new thread for each elapse event.

These are the 3 levels I am considering:

One single timer operates the entire application, it iterates through each object, checks to see if any other actions need to be fired, and if so, runs them, then moves on to the next.
Multi-tier timer where each object has a master timer that checks all of the functions the object could need to perform, runs any that are ready, and then sets the next timer interval to the next required action time.
Recursive-tier timer where each action in each object has it's own timer that will be triggered, and then set to run the next time it will be available.

The problem with option 1 is that with so many objects and actions, one singular timer elapse in this manner could run for maybe 20+ seconds (while it executed a few million lines of looped code), where this should probably be ticking every 1 second. If the objects aren't kept in synch, the system would likely not work well.

The problem with option 2 is that it would be a little harder to write than option 3, but not by much, it would also mean perhaps 10,000+ maybe timers running on the system (one for each object), creating and destroying threads with each elapse like its nobody's business (which I'm not sure if this is a problem or not). Each timer would have to fire at least once per second in this situation, with perhaps a few hundred lines of code running (up to perhaps a thousand in an extreme case).

The problem with option 3 is the sheer amount of timers that could potentially be introduced into the system. I'm talking about an average of 10,000+ timers with the potential for near 100,000+ timers to be run at the same time. Each elapse event may only have to run 50 or less lines of code though, making them very short. The elapse events would have delays between a hundredth of a second on one extreme, and five minutes on the other, with the average likely being around 1 second.

I am proficient in Visual Basic .NET, and was planning to write it in that, but I could also revert to my high-school days and try to write this in C++ for efficiency if it would make that much of a difference (please let me know if you have any sources on code efficiency between languages). Also toying with the notion of running this on a clustered Linux server instead of my Quad Core Windows server, but I'm not sure if I could get any of my .NET apps to run on a linux cluster like that (would love any info on that as well).

The main question to answer for this topic is:

Do I use option 1, 2, or 3, and why?

~Edit after considering comments~

So the 4th option involving the timer wheel with a spinlock. Here is a job class:

Public Class Job
Private dFireTime As DateTime
Private objF As CrossAppDomainDelegate
Private objParams() As Object

Public Sub New(ByVal Func As CrossAppDomainDelegate, ByVal Params() As Object, ByVal FireTime As DateTime)
    objF = Func
    dFireTime = FireTime
    objParams = Params
End Sub

Public ReadOnly Property FireTime()
    Get
        Return dFireTime
    End Get
End Property

Public ReadOnly Property Func() As CrossAppDomainDelegate
    Get
        Return objF
    End Get
End Property

Public ReadOnly Property Params() As Object()
    Get
        Return objParams
    End Get
End Property
End Class

And then the main loop implementation:

Private Tasks As LinkedList(Of Job)

Private Sub RunTasks()
    While True
        Dim CurrentTime as DateTime = Datetime.Now            

        If Not Tasks.Count = 0 AndAlso Tasks(0).FireTime > CurrentTime Then
            Dim T As Job = Tasks(0)
            Tasks.RemoveFirst()
            T.Func.Invoke()
        Else
            Dim MillisecondDif As Double

            MillisecondDif = Tasks(0).FireTime.Subtract(CurrentTime).Milliseconds
            If MillisecondDif > 30 Then
                Threading.Thread.Sleep(MillisecondDif)
            End If
        End If

    End While
End Sub

Do I have it right?

EpicClanWars.com

~Edit 2~

Switched the word "Task" out for "Job" so ppl could stop complaining about it ;)

~Edit 3~

Added variables for tracking time & ensuring spinloops happen when needed

Marleah answered 15/10, 2010 at 18:32 Comment(8)

I feel compelled to ask, what is this thing going to do? – Contagious 15/10, 2010 at 18:42

The use of the project is just as interesting to me as the problem it presents, I'd like to solve the problem first, then I'll tell you the use. It may be less important to some if I state the use outright, or other answers may be suggested to use other people's existing projects where truely, other projects do not fit this. To prevent this, I don't want to state what the use of this code does until I've figured out how to do it. – Marleah 15/10, 2010 at 19:30

Not sure if I catch the drift without more details, but it has the feel of Quartz.net quartznet.sourceforge.net/faq.html – Rizika 15/10, 2010 at 19:31

@Kenny: That's interesting, but it looks more like a cron port, which is to say that it's for a small number of events that are typically minutes apart. – Biceps 15/10, 2010 at 20:11

Please don't call it a Task. There is already a Task class in .NET 4.0, so this will only cause confusion. – Biceps 15/10, 2010 at 22:29

It's right in the generalities, but a bit incomplete in the specifics. First, when it invokes the handler, it needs to do it through the thread pool. The current code will potentially cross AppDomains but it'll still be in the same thread. Second, you need to check whether the next sleep duration would be under 15ms (or, to be safe, 30ms), and if it is then spin. Third, I would recommend capturing the value of DateTime.Now at the top of the while loop in a local variable, just so you can use it unambiguously. Fourth and finally, you might need to support repeating events. – Biceps 15/10, 2010 at 22:33

"System.Timers.Timer class which fires a new thread" it doesn't, it uses the thread pool. – Aimo 16/10, 2010 at 8:28

Still not quite there. 1) The property is called Millsecond, not Milliseconds, and it's an int, not a double. Please don't use a double to hold an int. 2) The invoke on the delegate happens in the same thread. You need to instead enqueue the delegate in the thread pool. – Biceps 17/10, 2010 at 7:17

EDIT: I remember interesting interview definetely worth to view: Arun Kishan: Inside Windows 7 - Farewell to the Windows Kernel Dispatcher Lock

As @Steven Sudit stated I warn again: use it only as demo on how timer wheel works and some tasks you have to care about while implement it. Not as reference implementation. In real world you have to write far more complex logic to take into account available resources, scheduling logic and etc.

Here good points stated by Steven Sudit (read post comments for details):

1) Choose right structure to keep your jobs list (it depends as usually):

SortedList<> (or SortedDictionary<>) good on memory consumption and indexing but have to implement synchronized access
ConcurrentQueue<> will help you avoid locking but you have to implement ordering. It also very memory efficient
LinkedList<> is good on insert and retrieve (anyway we need head only) but requires synchronized access (thru it easily implemented via lock-free) and not so memory efficient as it stores two references (prev/next). But it become an issue when you have millions of jobs so all of them take significant amount of memory.

But:

I totally agree with @Steven:

It doesn't matter: neither one of these is a good fit. The right answer would be to use a regular queue and maintain its order yourself, since we most often need to access it only from the head or tail.

Generally, I would recommend using the most feature-full collection from the library, but that doesn't apply here because this is system-level code. We'd need to roll our own, either from scratch or on top of a less feature-rich collection

2) To simplify processing logic of simultaneous jobs you can add delegate list (e.g. via ConcurrentQueue to make it lock-free) into original Job class so when you need another job at same time you just add another delegate to start.

@Steven:

If two tasks are actually scheduled for the same time (whether actually or effectively), this is a normal case that does not require complicating our data structure. In other words, we don't need to group up simultaneous jobs so that we have to traverse two different collections; we can just make them adjacent

3) Start/stoping dispatcher not so straightful as it can be and so can lead to errors. Instead you can wait on an event while using a timeout.

@Steven:

This way, it will either wake up when the next job is ready or when a new job is inserted before the head. In the latter case, it could need to run it now or set a different wait. If presented with, say, 100 jobs all scheduled for the same instant, the best thing we can do is queue them all up.

If we needed to provide prioritization, that's a job for a priority dispatch queue and multiple pools in a producer/consumer relationship, but it still doesn't justify a start/stop dispatcher. The dispatcher should always be on, running in a single loop that sometimes cedes the core

4) About using ticks:

@Steven:

Sticking to one type of tick is fine, but mixing and matching gets ugly, particularly since it's hardware-dependent. I'm sure that ticks would be slightly faster than milliseconds, because it stores the former and has to divide by a constant to get the latter. Whether this operation winds up being costly is another matter, but I'm fine with using ticks to avoid the risk.

My thoughts:

Another good point, I agree with you. But sometimes division by constant becomes costly and it not so fast as it maybe seems to be. But when we talk about 100 000 of DateTimes it does not matter, you are right, thank you to point.

5) "Managing resources":

@Steven:

The problem I'm trying to highlight is that the call to GetAvailableThreads is expensive and naive; the answer is obsolete before you can even use it. If we really wanted to keep track, we could get initial values and keep a running count by calling the job from a wrapper that uses Interlocked.Increment/Decrement. Even then, it presumes that the rest of the program is not using the thread pool. If we really wanted fine control, then the right answer here is to roll our own thread pool

I absolutely agree that calling to GetAvailableThreads is naive method to monitor available resources thru CorGetAvailableThreads not so expensive. I want to demontrate there are needs to manage resources and seems to choose bad example.

By any means provided in source code example is must not be treated as right way to monitor available resources. I just want to demonstrate you have to think about it. Thru maybe coded no so good piece of code as example.

6) Using Interlocked.CompareExchange:

@Steven:

No, it's not a common pattern. The most common pattern is to briefly lock. Less common is to flag the variable as volatile. Much less common would be to use VolatileRead or MemoryBarrier. Using Interlocked.CompareExchange this way is obscure, even if Richter does it. using it without an explanatory comment is absolutely guaranteed to confuse, as the word "Compare" implies that we're doing a comparison, when in fact we're not.

You are right I have to point about its usage.

using System;
using System.Threading;

// Job.cs

// WARNING! Your jobs (tasks) have to be ASYNCHRONOUS or at least really short-living
// else it will ruin whole design and ThreadPool usage due to potentially run out of available worker threads in heavy concurrency

// BTW, amount of worker threads != amount of jobs scheduled via ThreadPool
// job may waits for any IO (via async call to Begin/End) at some point 
// and so free its worker thread to another waiting runner

// If you can't achieve this requirements then just use usual Thread class
// but you will lose all ThreadPool's advantages and will get noticeable overhead

// Read http://msdn.microsoft.com/en-us/magazine/cc164139.aspx for some details

// I named class "Job" instead of "Task" to avoid confusion with .NET 4 Task 
public class Job
{
    public DateTime FireTime { get; private set; }

    public WaitCallback DoAction { get; private set; }
    public object Param { get; private set; }

    // Please use UTC datetimes to avoid different timezones problem
    // Also consider to _never_ use DateTime.Now in repeat tasks because it significantly slower 
    // than DateTime.UtcNow (due to using TimeZone and converting time according to it)

    // Here we always work with with UTC
    // It will save you a lot of time when your project will get jobs (tasks) posted from different timezones
    public static Job At(DateTime fireTime, WaitCallback doAction, object param = null)
    {
        return new Job {FireTime = fireTime.ToUniversalTime(), DoAction = doAction, Param = param};
    }

    public override string ToString()
    {
        return string.Format("{0}({1}) at {2}", DoAction != null ? DoAction.Method.Name : string.Empty, Param,
                             FireTime.ToLocalTime().ToString("o"));
    }
}

 using System;
 using System.Collections.Generic;
 using System.Diagnostics;
 using System.Linq;
 using System.Threading;

// Dispatcher.cs

// Take a look at System.Runtime IOThreadTimer.cs and IOThreadScheduler.cs
// in Microsoft Reference Source, its interesting reading

public class Dispatcher
{
    // You need sorted tasks by fire time. I use Ticks as a key to gain some speed improvements during checks
    // There are maybe more than one task in same time
    private readonly SortedList<long, List<Job>> _jobs;

    // Synchronization object to access _jobs (and _timer) and make it thread-safe
    // See comment in ScheduleJob about locking
    private readonly object _syncRoot;

    // Queue (RunJobs method) is running flag
    private int _queueRun;

    // Flag to prevent pollute ThreadPool with many times scheduled JobsRun
    private int _jobsRunQueuedInThreadPool;

    // I'll use Stopwatch to measure elapsed interval. It is wrapper around QueryPerformanceCounter
    // It does not consume any additional resources from OS to count

    // Used to check how many OS ticks (not DateTime.Ticks!) elapsed already
    private readonly Stopwatch _curTime;

    // Scheduler start time. It used to build time delta for job
    private readonly long _startTime;

    // System.Threading.Timer to schedule next active time
    // You have to implement syncronized access as it not thread-safe
    // http://msdn.microsoft.com/en-us/magazine/cc164015.aspx
    private readonly Timer _timer;

    // Minimum timer increment to schedule next call via timer instead ThreadPool
    // Read http://www.microsoft.com/whdc/system/pnppwr/powermgmt/Timer-Resolution.mspx
    // By default it around 15 ms
    // If you want to know it exactly use GetSystemTimeAdjustment via Interop ( http://msdn.microsoft.com/en-us/library/ms724394(VS.85).aspx )
    // You want TimeIncrement parameter from there
    private const long MinIncrement = 15 * TimeSpan.TicksPerMillisecond;

    // Maximum scheduled jobs allowed per queue run (specify your own suitable value!)
    // Scheduler will add to ThreadPool queue (and hence count them as processed) no more than this constant

    // This is balance between how quick job will be scheduled after it time elapsed in one side, and 
    // how long JobsList will be blocked and RunJobs owns its thread from ThreadPool
    private const int MaxJobsToSchedulePerCheck = 10;

    // Queue length
    public int Length
    {
        get
        {
            lock (_syncRoot)
            {
                return _jobs.Count;
            }
        }
    }

    public Dispatcher()
    {
        _syncRoot = new object();

        _timer = new Timer(RunJobs);

        _startTime = DateTime.UtcNow.Ticks;
        _curTime = Stopwatch.StartNew();

        _jobs = new SortedList<long, List<Job>>();
    }


    // Is dispatcher still working
    // Warning! Queue ends its work when no more jobs to schedule but started jobs can be still working
    public bool IsWorking()
    {
        return Interlocked.CompareExchange(ref _queueRun, 0, 0) == 1;
    }

    // Just handy method to get current jobs list
    public IEnumerable<Job> GetJobs()
    {
        lock (_syncRoot)
        {
            // We copy original values and return as read-only collection (thread-safety reasons)
            return _jobs.Values.SelectMany(list => list).ToList().AsReadOnly();
        }
    }

    // Add job to scheduler queue (schedule it)
    public void ScheduleJob(Job job)
    {
        // WARNING! This will introduce bottleneck if you have heavy concurrency. 
        // You have to implement lock-free solution to avoid botleneck but this is another complex topic.
        // Also you can avoid lock by using Jeffrey Richter's ReaderWriterGateLock (http://msdn.microsoft.com/en-us/magazine/cc163532.aspx)
        // But it can introduce significant delay under heavy load (due to nature of ThreadPool)
        // I recommend to implement or reuse suitable lock-free algorithm. 
        // It will be best solution in heavy concurrency (if you have to schedule large enough job count per second)
        // otherwise lock or maybe ReaderWriterLockSlim is cheap enough
        lock (_syncRoot)
        {
            // We'll shift start time to quick check when it pasts our _curTime
            var shiftedTime = job.FireTime.Ticks - _startTime;

            List<Job> jobs;
            if (!_jobs.TryGetValue(shiftedTime, out jobs))
            {
                jobs = new List<Job> {job};
                _jobs.Add(shiftedTime, jobs);
            }
            else jobs.Add(job);


            if (Interlocked.CompareExchange(ref _queueRun, 1, 0) == 0)
            {
                // Queue not run, schedule start
                Interlocked.CompareExchange(ref _jobsRunQueuedInThreadPool, 1, 0);
                ThreadPool.QueueUserWorkItem(RunJobs);
            }
            else 
            {
                // else queue already up and running but maybe we need to ajust start time
                // See detailed comment in RunJobs

                long firetime = _jobs.Keys[0];
                long delta = firetime - _curTime.Elapsed.Ticks;

                if (delta < MinIncrement)
                {
                    if (Interlocked.CompareExchange(ref _jobsRunQueuedInThreadPool, 1, 0) == 0)
                    {
                        _timer.Change(Timeout.Infinite, Timeout.Infinite);
                        ThreadPool.QueueUserWorkItem(RunJobs);
                    }
                }
                else 
                {
                    Console.WriteLine("DEBUG: Wake up time changed. Next event in {0}", TimeSpan.FromTicks(delta));
                    _timer.Change(delta/TimeSpan.TicksPerMillisecond, Timeout.Infinite);
                }
            }

        }
    }


    // Job runner
    private void RunJobs(object state)
    {
        // Warning! Here I block list until entire process done, 
        // maybe better will use ReadWriterLockSlim or somewhat (e.g. lock-free)
        // as usually "it depends..."

        // Here processing is really fast (a few operation only) so until you have to schedule many jobs per seconds it does not matter
        lock (_syncRoot)
        {
            // We ready to rerun RunJobs if needed
            Interlocked.CompareExchange(ref _jobsRunQueuedInThreadPool, 0, 1);

            int availWorkerThreads;
            int availCompletionPortThreads;

            // Current thread stats
            ThreadPool.GetAvailableThreads(out availWorkerThreads, out availCompletionPortThreads);

            // You can check max thread limits by
            // ThreadPool.GetMaxThreads(out maxWorkerThreads, out maxCompletionPortThreads);

            int jobsAdded = 0;

            while (jobsAdded < MaxJobsToSchedulePerCheck && availWorkerThreads > MaxJobsToSchedulePerCheck + 1 && _jobs.Count > 0)
            {
                // SortedList<> implemented as two arrays for keys and values so indexing on key/value will be fast
                // First element
                List<Job> curJobs = _jobs.Values[0];
                long firetime = _jobs.Keys[0];

                // WARNING! Stopwatch ticks are different from DateTime.Ticks
                // so we use _curTime.Elapsed.Ticks instead of _curTime.ElapsedTicks

                // Each tick in the DateTime.Ticks value represents one 100-nanosecond interval. 
                // Each tick in the ElapsedTicks value represents the time interval equal to 1 second divided by the Frequency.
                if (_curTime.Elapsed.Ticks <= firetime) break;

                while (curJobs.Count > 0 &&  jobsAdded < MaxJobsToSchedulePerCheck && availWorkerThreads > MaxJobsToSchedulePerCheck + 1)
                {
                    var job = curJobs[0];

                    // Time elapsed and we ready to start job
                    if (job.DoAction != null)
                    {
                        // Schedule new run

                        // I strongly recommend to look at new .NET 4 Task class because it give superior solution for managing Tasks
                        // e.g. cancel run, exception handling, continuation, etc
                        ThreadPool.QueueUserWorkItem(job.DoAction, job);
                        ++jobsAdded;

                        // It may seems that we can just decrease availWorkerThreads by 1 
                        // but don't forget about started jobs they can also consume ThreadPool's threads
                        ThreadPool.GetAvailableThreads(out availWorkerThreads, out availCompletionPortThreads);
                    }

                    // Remove job from list of simultaneous jobs
                    curJobs.Remove(job);
                }

                // Remove whole list if its empty
                if (curJobs.Count < 1) _jobs.RemoveAt(0);
            }

            if (_jobs.Count > 0)
            {
                long firetime = _jobs.Keys[0];

                // Time to next event
                long delta = firetime - _curTime.Elapsed.Ticks; 

                if (delta < MinIncrement) 
                {
                    // Schedule next queue check via ThreadPool (immediately)
                    // It may seems we start to consume all resouces when we run out of available threads (due to "infinite" reschdule)
                    // because we pass thru our while loop and just reschedule RunJobs
                    // but this is not right because before RunJobs will be started again
                    // all other thread will advance a bit and maybe even complete its task
                    // so it safe just reschedule RunJobs and hence wait when we get some resources
                    if (Interlocked.CompareExchange(ref _jobsRunQueuedInThreadPool, 1, 0) == 0)
                    {
                        _timer.Change(Timeout.Infinite, Timeout.Infinite);
                        ThreadPool.QueueUserWorkItem(RunJobs);
                    }
                }
                else // Schedule next check via timer callback
                {
                    Console.WriteLine("DEBUG: Next event in {0}", TimeSpan.FromTicks(delta)); // just some debug output
                    _timer.Change(delta / TimeSpan.TicksPerMillisecond, Timeout.Infinite);
                }
            }
            else // Shutdown the queue, no more jobs
            {
                Console.WriteLine("DEBUG: Queue ends");
                Interlocked.CompareExchange(ref _queueRun, 0, 1); 
            }
        }
    }
}

Quick example of usage:

    // Test job worker
    static void SomeJob(object param)
    {
        var job = param as Job;
        if (job == null) return;

        Console.WriteLine("Job started: {0}, [scheduled to: {1}, param: {2}]", DateTime.Now.ToString("o"),
                          job.FireTime.ToLocalTime().ToString("o"), job.Param);
    }

    static void Main(string[] args)
    {
        var curTime = DateTime.UtcNow;
        Console.WriteLine("Current time: {0}", curTime.ToLocalTime().ToString("o"));
        Console.WriteLine();

        var dispatcher = new Dispatcher();

        // Schedule +10 seconds to future
        dispatcher.ScheduleJob(Job.At(curTime + TimeSpan.FromSeconds(10), SomeJob, "+10 sec:1"));
        dispatcher.ScheduleJob(Job.At(curTime + TimeSpan.FromSeconds(10), SomeJob, "+10 sec:2"));

        // Starts almost immediately
        dispatcher.ScheduleJob(Job.At(curTime - TimeSpan.FromMinutes(1), SomeJob, "past"));

        // And last job to test
        dispatcher.ScheduleJob(Job.At(curTime + TimeSpan.FromSeconds(25), SomeJob, "+25 sec"));

        Console.WriteLine("Queue length: {0}, {1}", dispatcher.Length, dispatcher.IsWorking()? "working": "done");
        Console.WriteLine();

        foreach (var job in dispatcher.GetJobs()) Console.WriteLine(job);
        Console.WriteLine();

        Console.ReadLine();

        Console.WriteLine(dispatcher.IsWorking()?"Dispatcher still working": "No more jobs in queue");

        Console.WriteLine();
        foreach (var job in dispatcher.GetJobs()) Console.WriteLine(job);

        Console.ReadLine();
    }

Hope it will be helpful.

@Steven Sudit point me some issues, so here I try to give my vision.

1) I wouldn't recommend using SortedList here, or anywhere else, as it's an obsolete .NET 1.1 class

SortedList<> by any means is not obsolete. It still exists in .NET 4.0 and introduced in .NET 2.0 when generics was introduced into language. I can't see any point to remove it from .NET.

But real question here I trying to answer: What data structure can store values in sorted order and will be efficient in storing and indexing. There are two suitable ready to use data structures: SortedDictionary<> and SortedList<>. Here some info about how to choose. I just don't want waste implementation with my own code and hide main algorithm. Here I can implement priority-array or something other but it takes more lines to code. I don't see any reason do not use SortedList<> here...

BTW, I can't understand why you not recommend it? What are reasons?

2) In general, there's no need to complicate the code with special cases for simultaneous events.

When @Jrud says he probably will have numerous task to schedule I think it they may have heavy concurrency, so I demonstrate how to solve it. But my point: even if you have low concurrency you stil have chance to get events in same time. Also this is easy possible in multithreaded evironment or when there are many sources want to schedule jobs.

Interlocked functions not so complicated, cheap and since .NET 4.0 inlined so there are no problem to add guard in such situation.

3) The IsWorking method should just use a memory barrier and then read the value directly.

Im not so sure here that you are right. I would recommend to read two nice articles: Part 4: Advanced Threading of Threading in C# by Joseph Albahari and How Do Locks Lock? by Jeff Moser. And of cause Chapter 28 (Primitive Thread Synchronization Constructs) of CLR via C# (3rd edition) by Jeffrey Richter.

Here some qoute:

The MemoryBarrier method doesn’t access memory but it forces any earlier programorder loads and stores to be completed before the call to MemoryBarrier. And it also forces any later program-order loads and stores to be completed after the call to MemoryBarrier. MemoryBarrier is much less useful than the other two methods

Important I know that this can be very confusing, so let me summarize it as a simple rule: When threads are communicating with each other via shared memory, write the last value by calling VolatileWrite and read the first value by calling VolatileRead.

I would also recommend: Intel® 64 and IA-32 Architectures Software Developer's Manuals if you care about it seriously.

So I don't use VolatileRead/VolatileWrite in my code neither volatile keyword, I don't think Thread.MemoryBarrier will be better here. Maybe you can point me what I miss? Some articles or in-depth discussion?

4) The GetJobs method looks like it could lock for an extended period. Is it necessary?

First of all its just handy method, sometime it is necessary to get all tasks in queue at least for debugging.

But you are not right. As I mentioned in code comments SortedList<> implemented as two arrays you can check this by Reference Source or just by viewing in Reflector. Here some comments from reference source:

// A sorted list internally maintains two arrays that store the keys and
// values of the entries.

I got from .NET 4.0 but it not changed much since 2-3.5

So my code:

_jobs.Values.SelectMany(list => list).ToList().AsReadOnly();

involve following:

iterate thru values in array of references to List. Indexing array is very fast.
iterate thru each List (which is implemented internally as array too). It very fast too.
build new List of references (via ToList()) which is very fast too (just dynamic array) (.NET has very solid and fast implementation)
build read-only wrapper (no copy, just iterator wrapper)

so consequently we have just flatten read-only list of references to Job's objects. It very fast even you have millions of task. Try to measure yourself.

Any way I added it to show what happens during execution cycle (for debug purposes) but I think it can be useful.

5) A lock-free queue is available in .NET 4.0.

I would recommend to read Patterns of parallel programming by Stephen Toub and Thread-safe Collections in .NET Framework 4 and Their Performance Characteristics, also here many interesting articles.

So I quote:

ConcurrentQueue(T) is a data structure in .NET Framework 4 that provides thread-safe access to FIFO (First-In First-Out) ordered elements. Under the hood, ConcurrentQueue(T) is implemented using a list of small arrays and lock-free operations on the head and tail arrays, hence it is quite different than Queue(T) which is backed by an array and relies on the external use of monitors to provide synchronization. ConcurrentQueue(T) is certainly more safe and convenient than manual locking of a Queue(T) but some experimentation is required to determine the relative performance of the two schemes. In the remainder of this section, we will refer to a manually locked Queue(T) as a self-contained type called SynchronizedQueue(T).

It don't have any methods to maintain ordered queue. Neither any of new thread-safe collection, they all maintain unordered collection. But reading original @Jrud description I think we have to maintain ordered list of time when task need to be fired. Am I wrong?

6) I wouldn't bother starting and stopping the dispatcher; just let it sleep until the next job

Do you know good way to make sleep ThreadPool's thread? How you will implement it?

I think dispatcher goes "sleep" when he does not process any task and schedule job wake-up it. Anyway there are no special processing to put it sleep or wake up so in my thoughts this process equals "sleep".

If you told that I should just reschedule RunJobs via ThreadPool when no jobs available when you are wrong it will eat too many resources and can impact started jobs. Try yourself. Why to do unnecessary job when we can easily avoid it.

7) Rather than worrying about different kinds of ticks, you could just stick to milliseconds.

You are not right. Either you stick to ticks or you don't care about it entirely. Check DateTime implementation, each access to milliseconds property involve converting internal representaion (in ticks) to ms including division. This can hurt performance on old (Pentium class) compulters (I measure it myself and you can too).

In general I will agree with you. We don't care about representation here because it does not give us noticeable performance boost.

It just my habbit. I process billions of DateTime in recent project so coded accordingly to it. In my project there are noticeable difference between processing by ticks and by other components of DateTime.

8) The attempt to keep track of available threads does not seem likely to be effective

I just want to demonstrate you have to care about it. In real world you have to implement far from my straightful logic of scheduling and monitoring resources.

I want to demonstrate timer wheel algorithm and point to some problem that author have to think when implement it.

You are absolutely right I have to warn about it. I thought "quickly ptototype" would be enough. My solution in any means can't be used in production.

Kame answered 15/10, 2010 at 18:33 Comment(39)

Wow, you REALLY care about my question lol +1, and I do understand C# code... I learned it a few semesters after VB in college. It uses same API as VB so every things pretty much same, just different syntax. +1 for taking serious interest. – Marleah 17/10, 2010 at 5:43

There are some good ideas in here, but also some issues, including many over-complications. 1) I wouldn't recommend using SortedList here, or anywhere else, as it's an obsolete .NET 1.1 class. 2) In general, there's no need to complicate the code with special cases for simultaneous events. 3) The IsWorking method should just use a memory barrier and then read the value directly. 4) The GetJobs method looks like it could lock for an extended period. Is it necessary? 5) A lock-free queue is available in .NET 4.0. – Biceps 17/10, 2010 at 7:48

5) I wouldn't bother starting and stopping the dispatcher; just let it sleep until the next job. 6) Rather than worrying about different kinds of ticks, you could just stick to milliseconds. 7) The attempt to keep track of available threads does not seem likely to be effective. – Biceps 17/10, 2010 at 7:49

@Steven Sudit, thank you for comments. I try to answer in my post rather than here. – Kame 17/10, 2010 at 12:13

Let me say in advance that I'm very familiar with the resources you reference, so I'm going to try to keep the scope narrow in my replies. As I said before, there are many good things about your solution, but I think we agree that there is much that would need to be changed before we're ready for production. My comments are suggestions for how that could be done. Ultimately, this would need to be benchmarked with real-world data, but first we need something that is simple and correct. Otherwise, when we tweak it for speed, we are likely to either uncover latent breaks or insert new ones. – Biceps 17/10, 2010 at 15:7

1) You're right that the type of SortedList you used is 2.0, not 1.1; I was mistaken. The reason I would not recommend a non-generic collection for this sort of work is because the need to box and unbox introduces complexity and impacts performance. From what I understand, SortedDictionary<> would be better for this task than SortedList, but neither one is quite right. In particular, we want a list that doesn't need reallocation (linked list, not array), that is kept sorted by key but retrieved only sequentially, and that allows multiple values for each key. – Biceps 17/10, 2010 at 15:7

Generally, I would recommend using the most feature-full collection from the library, but that doesn't apply here because this is system-level code. We'd need to roll our own, either from scratch or on top of a less feature-rich collection. – Biceps 17/10, 2010 at 15:7

2) In practice, you are unlikely to get simultaneous events unless they're being scheduled on round numbers (such as to the nearest second). That's why I'm suggesting a queue that does not try to combine multiple events, instead leaving them adjacent. This simplifies the dispatch logic, as it removes the special case.3) As written, the IsWorking method calls Interlocked.Exchange without exchanging anything. The only thing you could possibly get out of this is volatility, which is better provided by any of the methods you mention. This is as much a matter of code readability as performance. – Biceps 17/10, 2010 at 15:8

4) Let me put this more concretely: no matter how you slice it, it's O(n). While I would recommend keeping the queue small by inserting only one instance of a recurring event at a time, it's still possible for it to grow to non-trivial size, which means the copy operation can be expensive. If we reuse job instances when they recur, then the copy would need to be deep, adding to the cost. Ultimately, there does not appear to be any actual use case for this method, but calling it could cause jobs to be delayed. On this basis, I would recommend removing it. – Biceps 17/10, 2010 at 15:8

5) You are correct that ConcurrentQueue does not provide sorting, but we don't actually need that. The most common operation is removing the head of the queue, which can be done without locks. Probably the next most common operation is adding a job to the tail. There remains the case where we need to iterate through the queue to find an insertion point, but even this can be done without locks; it's cheaper to retry on losing a race than to lock. It would probably help to check whether the new job was closer to the head or tail, to decide which direction to search from. – Biceps 17/10, 2010 at 15:9

6) I make this suggestion because the logic to start/stop the dispatcher is very complicated and therefore prone to error. To make the dispatcher sleep, I would recommend waiting on an event while using a timeout. This way, it will either wake up when the next job is ready or when a new job is inserted before the head. In the latter case, it could need to run it now or set a different wait. If presented with, say, 100 jobs all scheduled for the same instant, the best thing we can do is queue them all up. – Biceps 17/10, 2010 at 15:10

If we needed to provide prioritization, that's a job for a priority dispatch queue and multiple pools in a producer/consumer relationship, but it still doesn't justify a start/stop dispatcher. The dispatcher should always be on, running in a single loop that sometimes cedes the core. – Biceps 17/10, 2010 at 15:10

7) Sticking to one type of tick is fine, but mixing and matching gets ugly, particularly since it's hardware-dependent. I'm sure that ticks would be slightly faster than milliseconds, because it stores the former and has to divide by a constant to get the latter. Whether this operation winds up being costly is another matter, but I'm fine with using ticks to avoid the risk. – Biceps 17/10, 2010 at 15:11

8) The problem I'm trying to highlight is that the call to GetAvailableThreads is expensive and naive; the answer is obsolete before you can even use it. If we really wanted to keep track, we could get initial values and keep a running count by calling the job from a wrapper that uses Interlocked.Increment/Decrement. Even then, it presumes that the rest of the program is not using the thread pool. If we really wanted fine control, then the right answer here is to roll our own thread pool. – Biceps 17/10, 2010 at 15:11

@Steven Sudit lets continue in comments :) 1) I declare my variable as private readonly SortedList<long, List<Job>> _jobs; so its generic SortedList variant why you talk about its non-generic counterpart? SortedList efficient in memory consumtion and iteration, SortedDictionary better when we need faster insert. I measure them myself and use in different project and found SortedList<> often effiently than SortedDictionary<> but as usually "it depends". – Kame 17/10, 2010 at 15:11

@2) I misundertand. You are right if we really can't get simultaneous event. But what if we can add similar tasks at same time as I do in demo and don't want to combine them into one? – Kame 17/10, 2010 at 15:17

@3 This is common pattern to get value in thread-safe way by using Interlocked methods, check Jeffrey Richter for example. It rely on fact Interlocked.CompareExchange (not Interlocked.Exchange as you mention) not only change value but also always return old value – Kame 17/10, 2010 at 15:19

@4 You are right of cause. I take as start @Marleah original class. Logic you are suggest have to be implemented but I think its far from aim of my original code. – Kame 17/10, 2010 at 15:25

@5 I don't understand how it helps. You need to maintain order, so you have to find right position to insert element (SortedList make it via ordinary BinarySearch). Also "ConcurrentQueue(T) is implemented using a list of small arrays and lock-free operations on the head and tail arrays, hence it is quite different than Queue(T) which is backed by an array and relies on the external use of monitors to provide synchronization". Take a look at TryDequeue implementation and Thread-safe Collections in .NET Framework 4 and Their Performance Characteristics paper it not always best choice. – Kame 17/10, 2010 at 15:33

@5 But it helps to avoid lock when get head you are right of cause but cost needs to be verify. – Kame 17/10, 2010 at 15:34

@7 Another good point, I agree with you. But sometimes division by constant becomes costly and it not so fast as it maybe seems to be. But when we talk about 100 000 of DateTimes it does not matter, you are right, thank you to point. – Kame 17/10, 2010 at 15:39

@8 I absolutely agree that calling to GetAvailableThreads is naive method to monitor available resources thru CorGetAvailableThreads not so expensive. I want to demontrate there are needs to manage resources and seems to choose bad example – Kame 17/10, 2010 at 15:48

@Steven Sudit thank you very much for commenting and point me to issues – Kame 17/10, 2010 at 15:49

I looks like we're closer to being on the same page, so I'm only going to reply on the items where there seems to be more that needs to be said. 1) It doesn't matter: neither one of these is a good fit. 2) If two tasks are actually scheduled for the same time (whether actually or effectively), this is a normal case that does not require complicating our data structure. In other words, we don't need to group up simultaneous jobs so that we have to traverse two different collections; we can just make them adjacent. – Biceps 17/10, 2010 at 16:37

3) No, it's not a common pattern. The most common pattern is to briefly lock. Less common is to flag the variable as volatile. Much less common would be to use VolatileRead or MemoryBarrier. Using Interlocked.CompareExchange this way is obscure, even if Richter does it. using it without an explanatory comment is absolutely guaranteed to confuse, as the word "Compare" implies that we're doing a comparison, when in fact we're not. – Biceps 17/10, 2010 at 16:38

4) It's true that sometimes a quick lock is cheaper than complicated non-locking mechanisms. Even if this were the case here, which I'm not convinced, the right answer would be to use a regular queue and maintain its order yourself, since we most often need to access it only from the head or tail. Contrary to what you said, SortedList is O(n) for insertions, which means it's not using a binary search. – Biceps 17/10, 2010 at 16:39

Yes, this is an interesting problem, and it's led to an interesting discussion. – Biceps 17/10, 2010 at 16:39

@Steven (4): I agree, I usually implemented in similar situation data structure you describes but I think it go beyond the scope example. But SortedList<> use BinarySearch. Check youself in source or via Reflector in Add it use int i = Array.BinarySearch<TKey>(keys, 0, _size, key, comparer); to determine position when it needs to insert to. – Kame 17/10, 2010 at 16:55

I recheck: msdn.microsoft.com/en-us/library/ms132330.aspx It is an O(log n) operation if the new element is added at the end of the list. If insertion causes a resize, the operation is O(n). So you are not right – Kame 17/10, 2010 at 16:59

It says "This method is an O(n) operation for unsorted data, where n is Count. It is an O(log n) operation if the new element is added at the end of the list. If insertion causes a resize, the operation is O(n)." Now, we can ignore the last case, because any realloc of a fixed-sized block has to be O(n). Normally, a linked list has a tail pointer, so inserting at the end costs O(1), while inserting in sorted order is O(n), since it has to traverse the list. (continued) – Biceps 17/10, 2010 at 17:19

The way I figure, since it's using an array instead of a linked list, it can find the point in O(log n) by using a binary search, but still has to pay O(n) to move the current elements over to make room for the new one. As such, insertion becomes O(n) because the slower part dominates. This explains why inserting to the end is so slow: it's a binary search to the last element, but not followed by a need to make room. So you're right that it's using a binary search, but would be wrong to suggest that its performance is akin to a linked binary tree (which is to say, O(log n) for most ops). – Biceps 17/10, 2010 at 17:23

Frankly, this is not a great set of characteristics, unless the constants involved are tiny in comparison to the linked-list alternative. – Biceps 17/10, 2010 at 17:25

@Steven, we started to discuss SortedList<>, it always do BinarySearch before insert. Check Reflector or refence source. But you switched in discussion (when?) to LinkedList<> of cause its behaviour is different. Just look how Microsoft team implemented it and your questions will get answers – Kame 17/10, 2010 at 17:46

@Steven and you are right of cause, linked list structures outperformance array based on insert/remove since they dont have to resize (or at least move) source array. But as usually here we have to choice we want to be fast on insert (SortedDictionary<> implemented as binary search tree) or we want to be memory efficient. – Kame 17/10, 2010 at 17:53

I switched when I said that SortedList and SortedDictionary are both inappropriate, so we'd do better with LinkedList or ConcurrentQueue. I'm not convinced that an array-based collection would be significantly more memory-efficient, either in principle or practice. First, GC makes allocation cheap. Second, if you implement your own single-linked list, it actually takes up less memory. And, finally, even if there were such a trade-off, we can afford to burn some bytes but not to be slow. Having said that, .NET is definitely not a suitable environment for critical timing. – Biceps 17/10, 2010 at 18:11

I see. But you are not right assuming array based solutions are not memory efficient. Each class reference costs 24 byte overhead in 32-bit. So when you have 100 000 000 class refence, you get 2.4 GB overhead to keep them in memory. Array of structures does'nt have such overhead. But in my code since Job is class there are almost no difference (you are right) since they both contains class reference. I feel we go wrong direction start discussing abstract optimization ideas without real technical requirements on system behaviour. We both right. "It depends..." rule :) – Kame 17/10, 2010 at 18:26

Yes, it always depends on the details. The biggest detail is that .NET is not at all ideal for this task. GC can kick in at any time, timer precision is low and there's no guarantee whatsoever that the timer will be on time. Doing this in C++ would help, but the deeper issue is Windows itself, as it's just not designed for this sort of thing. – Biceps 17/10, 2010 at 19:56

Thru .NET head to right direction :) Windows itself was'nt hard-realtime OS in any time. GC becomes more controllable in .NET it starts to announce about full GC and there are some options to almost avoid full collection if app designed for that, there are option to create own CLR Host but all this things far from our original topic. @Steven thank you for great comments and discussion, I read and learn from them with great pleasure. – Kame 17/10, 2010 at 20:13

If you want we can continue :) I will commment your comments as usually before that :) I just thought we take away our readers from right direction and they lost in details :) – Kame 17/10, 2010 at 20:44

None of the above. The standard solution is to keep a list of events, such that each one points to the next one to occur. You then use a single timer and have it wake up only in time for the next event.

edit

Looks like this is called a timer wheel.

edit

As Sentinel pointed out, events should be dispatched to a thread pool. The handler for these events should do a small bit of work as quickly as possible, and without blocking. If it needs to do I/O, it should fire off an async task and terminate. Otherwise, a thread pool would overflow.

The .NET 4.0 Task class might be helpful here, particularly for its continuation methods.

Biceps answered 15/10, 2010 at 18:44 Comment(19)

This does sound like a standard way this problem might be solved, do you have an example of a design pattern which shows this? – Marleah 15/10, 2010 at 18:49

It's my standard way too. No design pattern, it follows from simple logic. The only thing you have to time is the duration to the next event. That takes one timer. And a list of events sorted by time. The list can be arbitrarily large. – Acidulent 15/10, 2010 at 18:59

Well, how do you guys think system timers work? They do exactly this. Determine when the next quickest event is due to expire, sleep for that time, wakeup and service the event, and then repeat. – Collodion 15/10, 2010 at 19:8

So what happens when over 1000 events have to be performed within the second? The timer class has a minimum elapse time of 1 millisecond. I can't do multiple queues easily either because events may have to be stuck in the order dynamically. – Marleah 15/10, 2010 at 19:18

@Hans: You're right that it wouldn't be called a design pattern. It's simply an algorithm, and a rather old and obvious one at that. – Biceps 15/10, 2010 at 19:19

@Jrud: It's not even accurate to 1ms. On most systems, the resolution is to the nearest 16ms. There is such a thing as a high-resolution timer, but I believe it's in some way system-specific. See msdn.microsoft.com/en-us/library/… – Biceps 15/10, 2010 at 19:19

Avoid to create many System.Timers.Timer / System.Threading.Timer, they are wrappers around Win32 waitable timer objects (look at msdn.microsoft.com/en-us/magazine/cc164015.aspx). Timer wheel is best in this situation. Also I would recommend to avoid span thread each time you need processing, instead use ThreadPool. Check msdn.microsoft.com/en-us/magazine/cc164139.aspx for details. – Kame 15/10, 2010 at 19:20

@Sentinel: Yes, you definitely want to dispatch events to a thread pool, both to avoid the overhead of thread creation/teardown, and to allow simultaneous events. – Biceps 15/10, 2010 at 19:21

@Ziff: As Sentinel suggests, OS timers are a bit heavy-weight. Not good for thousands. – Biceps 15/10, 2010 at 19:22

@Steven Sudit The 16 ms resolution further lowers my confidence in using this "algorithm" for my project. The timer wheel sounds great for multiple timers that you can either queue sequentially (where multiple timers can be used easily), or in situations where the events do not have to fire several thousand times per second. I am actually a little sad that it looks like I cannot easily implement it, it looks interesting. – Marleah 15/10, 2010 at 19:36

@Jrud: That doesn't follow. You can use a timer wheel with a spinlock instead of using an OS timer object. – Biceps 15/10, 2010 at 19:39

@Steven & Jrud: another point: Task processing logic have to be asynchronous (with long task split into parts or divided by async IO) when dispatching to ThreadPool or you can run out of available "logical threads" (concurrent worker threads) if many events (500 by default, depends on ThreadPool limits configuration per system) needs to be processed simultaneously. Check channel9.msdn.com/blogs/charles/… / msdn.microsoft.com/en-us/magazine/cc163323.aspx Also don't miss new .NET 4 Task class. – Kame 15/10, 2010 at 19:43

@Sentinel: More good advice. Yes, the whole point of being able to schedule so many pieces is to break up the task into chunks that do their business and pass the flow to something else, as with async I/O. If you have many events and they last a long time, a thread pool would be swamped. As for .NET 4.0 Tasks, they're great, but they don't seem directly relevant here. – Biceps 15/10, 2010 at 20:0

@Sentinel: If you break out your advice into its own answer, I'd be happy to upvote it. – Biceps 15/10, 2010 at 20:0

I am not thrilled about the extra CPU cycles a spinlock would present at first, but it may be a solution to the long-term for this. As there are more items in the queue, there shouldn't be as many cycles wasted on the spin lock, they will all be used. – Marleah 15/10, 2010 at 20:3

@Steven: Thank you :) I think better you will make final edit of your post when we gather different opinions from comments. I started to describe timer wheel when notice your answer and decided do not post my own. – Kame 15/10, 2010 at 20:16

@Jrud: If the next event is in 10 minutes, you would sleep, not spin. But you'd wake up a little early and make sure to be exactly on time. – Biceps 15/10, 2010 at 20:31

@Sentinel: It's up to you. I'll certainly integrate the best answers into a single one, but that doesn't stop you from getting full credit by having your own answer. – Biceps 15/10, 2010 at 20:33

@Sentinel & Steven Sudit I edited the origional post, is this what you had in mind for the timer wheel? – Marleah 15/10, 2010 at 21:15

The tradeoff in your three options is between memory and CPU. More timers mean more timer nodes (memory), and aggregating these timers into fewer timers means more CPU, as you check for events that need servicing at run time. The CPU overhead in starting too many timers (and expiring them) is not too great an issue with a decent timer implementation.

SO, in my opinion, if you have a good timer implementation, choose to start as many timers as you need (be as granular as possible). But if any of these timers per object are mutually exclusive, consider reusing a timer node.

Collodion answered 15/10, 2010 at 19:16 Comment(1)

He has the Windows OS timer, not a "good implementation". :-) – Biceps 15/10, 2010 at 19:23

This reminds me of the old airline ticketing systems, where you had queues. Ticketing requests were put in different queues depending on what kind of attention they needed.

So maybe you could have the queue of objects requiring frequent attention, and the queue of objects requiring infrequent attention. When necessary, you move them from one to the other.

You could have a timer for the frequent queue, and a timer for the infrequent queue. For the frequent queue, you could split it into multiple queues, one for each thread.

For crunching the frequent queue(s), you should not have more threads than you have cores. If you have two cores, what you want to do is get both of them cranking. Any more threads than that will not make things any faster. In fact, if processing the objects requires disk I/O or getting in line for some other shared hardware, it may not even help to get both cores running.

Fullgrown answered 15/10, 2010 at 23:20 Comment(2)

You can make this as complicated as you like, but I don't see much benefit in doing so. – Biceps 17/10, 2010 at 7:49

I'm referring to the suggestion that jobs could be split up into multiple queues. While this approach is certainly applicable in the real-world example of airline, I'm confident that the solution to the OP's problem is best served by a simple, provably correct solution with just one queue. Having a per-thread queue, for example, could easily lead to more affinity than we'd want, allowing jobs to be delayed even though another thread is available. – Biceps 17/10, 2010 at 13:49

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags