How to make certain number of threads running all the time
Asked Answered
M

5

7

Ok here my question. I want to start threads until a certain number. Lets say 100. So it will start starting threads and check continuously number of running threads. When the maximum number reached it will stop starting new threads. But with a proper checking interval or completed thread will signal and it will start new thread.

With this way, i will always have certain number of running threads.

I managed this with using sleep and permanent while. So i keep checking total running thread count with a given interval and if thread is completed, dispose it and start a new one.

But my solution is not coming me as a proper way. I suppose it would be better if the completed thread would signal and then checker would start a new one if we are below of the maximum number of threads threshold.

I saw many threadpool examples but most of them not contains any queued pooling with maximum amount of running threads. What i mean is, they just keep starting threads until they are done. But lets say i have 500k urls to harvest. I can not just start all of them in a for loop with thread pool.

platform is c# 4.5 WPF application

And here below is my solution. Actually i am looking for a better one. Not improving this one.

private void Button_Click_4(object sender, RoutedEventArgs e)
{
    Task.Factory.StartNew(() =>
    {
        startCrawler();
    });
}

void startCrawler()
{
    int irMaximumThreadcount = 100;
    List<Task> lstStartedThreads = new List<Task>();
    while (true)
    {
        for (int i = 0; i < lstStartedThreads.Count; i++)
        {
            if (lstStartedThreads[i].IsCompleted == true)
            {
                lstStartedThreads[i].Dispose();
                lstStartedThreads.RemoveAt(i);
            }
        }

        if (lstStartedThreads.Count < irMaximumThreadcount)
        {
            var vrTask = Task.Factory.StartNew(() =>
            {
                func_myTask();
            });
            lstStartedThreads.Add(vrTask);
        }

        System.Threading.Thread.Sleep(50);
    }
}

void func_myTask()
{

}
Methedrine answered 3/3, 2013 at 2:3 Comment(1)
"I can not just start all of them in a for loop with thread pool." - have you actually tried? Starting multiple threads with assumption that it will make your overall internet connection faster does not sound "as a proper way". Also conisder using asynchronous operations - will not need that many threads... Unless you have somethingl like 32-core machine...Jurist
L
6

Personally I would use PLINQ for this, and specifically the WithDegreeOfParallelism method which limits the number of concurrent executions to the passed in value.

private IEnumerable<Action> InfiniteFunctions()
{
    while(true)
    {
        yield return func_myTask;
    }
}

private void Button_Click_4(object sender, RoutedEventArgs e)
{
    int irMaximumThreadcount = 100;
    InfiniteFunctions()
        .AsParallel()
        .WithDegreeOfParallelism(irMaximumThreadcount)
        .ForAll(f => f());
}

EDIT: Actually reading the documentation it seems that irMaximumThreadCount can only be a max of 64 so watch out for that.

EDIT 2: Ok, had a better look and it seems Parallel.ForEach takes a ParallelOptions parameter which includes a MaxDegreeOfParallelism property that isn't limited - Check it out. So your code might be like:

private void CrawlWebsite(string url)
{
    //Implementation here
}

private void Button_Click_4(object sender, RoutedEventArgs e)
{
    var options = new ParallelOptions() 
    { 
        MaxDegreeOfParallelism = 2000 
    };

    Parallel.ForEach(massiveListOfUrls, options, CrawlWebsite);
}
Luncheonette answered 3/3, 2013 at 2:19 Comment(9)
now this is interesting. so you say this method can be used for example for crawling 500k pages. let me try :)Kitchenette
oh. then it is useless for me :) i am starting 2000 threads to check alive proxies for example ^^ even though task manager shows 490 threads. i don't know why not 2000 :)Kitchenette
Ah, watch out for my edit - max is only 64 in parallel. And yes you could loop over your list of 500,000 items and execute the func on each item.Luncheonette
thanks for answer. but 64 is a low limit for me. it should be able to run more threads at the same time.Kitchenette
Yeah... the problem is that having 2000 threads waiting for a request is a bit inefficient. Let me try answering the q again considering this extra informationLuncheonette
Don't consider the details yet. The issue is being able to run any number of threads at the same time with a most efficient way. The above solution seems nice but it has 64 threads limit.Kitchenette
Have a look at my edit. I think this one is not limited and also considers you have a giant list of urls or whatever...Luncheonette
yes once again very well solution expect it would not work in this scenario. Lets assume a crawl is failed. I want it re-tried until it gets completed. Since this is foreach, i won't be able to get it re-crawled.Kitchenette
You could put error catching/retry logic in the crawlwebsite function? This method would ensure max 2000 threads running even with that logicLuncheonette
H
4

You are mixing up tasks with threads. A task is not a thread. There is no guarantee that each task will have it's own thread.

Actually the TPL (Task Parallel Library) is some kind of queue. This means you can just create and start tasks for each Func or Action object you have. There is no easy way to control the number of threads that are actually created.

However, you can create many tasks with little overhead because the TPL will enqueue them and apply further logic to balance the work over the threads of the thread pool.

If some tasks need to be executed one after the other you can use Task.ContinueWith to enqueue them. It is also possible to start new tasks with Task.Factory.ContinueWhenAny or Task.Factory.ContinueWhenAll.

This is also the clue to how you can control the number of parallel tasks you want to create: Just create the desired number of tasks and enqueue the remaining tasks with ContinueWhenAny. Each time a task ends the next will be started.

Again: the TPL will balance the work among the threads in the thread pool. What you need to consider anyway is the use of other resources like disk I/O or internet connection. Having a lot of tasks that try to use the same resources concurrently can drastically slow down your program.

Hundredweight answered 3/3, 2013 at 3:14 Comment(1)
I have a lot of resources. 850 MB per second I/O read write speed, 50 mbit fiber connection. Anyway this is some useful info vote up :)Kitchenette
C
1

.NET 4.0 introduced several collections with built-in concurrency management which should be ideal for this situation. A blocking collection will be more effecient then sleeping in a while loop. You then just spawn x threads that read from the blocking queue.

BlockingCollection<string> queue = new BlockingCollection<string>(listOfUrls);

for (int x=0; x < MaxThreads; x++)
{
    Task.Factory.StartNew(() => 
    {
        while (true)
        {
            string url = queue.Take(); // blocks until url is available
            // process url;
        }
    }, TaskCreationOptions.LongRunning);
}

You mark the task as long running so it will create it's own thread instead of using the thread pool. If you need first in first out, you can pass in a ConcurrentQueue<T> to the blocking collection constructor. http://msdn.microsoft.com/en-us/library/dd287085.aspx

Cletus answered 3/3, 2013 at 3:9 Comment(0)
S
0

Not an exact answer, but I think this may guide you in the correct direction.

First, take a look at Thread.Join, especially the simple example given at the bottom of this page. This approach is superior to Thread.Sleep() and more suitable for your purpose. I'm thinking on the lines of *Join*ing the "manager" thread instead of *Sleep*ing.

The second option that may or may not suit your purpose, is the new Tasks library. Since you're using the latest version of the framework, this option is available, but then I guess you cannot control the actual number of threads created by the Tasks library. It automatically chooses that value based on the underlying scheduler. However, there's an option named ParallelOptions.MaxDegreeOfParallelism that sounds interesting.

Sparge answered 3/3, 2013 at 2:20 Comment(3)
as far as i know thread join is used to wait all tasks finishs. am i incorrect ? if so how can i use it ? i don't need to wait all tasks. when 1 task finished, another will start immediately so there will be always certain number of tasks runningKitchenette
mmmm...Not 100% sure about it, but I think Join only halts the calling thread. Another idea could be to Join the newly created worker threads so that they start working immediately once one of the currently running threads signals that it is complete, so the manager doesn't have to check over and over.Sparge
nope that would not work. because threads are independetly finished. first started may finish last or last started may finish first.Kitchenette
P
0

you can manage yourself the Task/Thread pool and wait for any Thread to be done and start a new one right away.

MAX_THREAD_ALLOWED = 100;
List<Task> tasks = new List<Task>();
for (int i = 0; i < 1000; i++)
{
    tasks.Add(Task.Run(() => { Foo(i); }));
    if (i == MAX_THREAD_ALLOWED)
    {
        Task.WaitAny(tasks.ToArray());
        MAX_THREAD_ALLOWED++;
    }
}
Pallet answered 15/5, 2019 at 9:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.