C# Downloader: should I use Threads, BackgroundWorker or ThreadPool?
Asked Answered
A

4

7

I'm writing a downloader in C# and stopped at the following problem: what kind of method should I use to parallelize my downloads and update my GUI?

In my first attempt, I used 4 Threads and at the completion of each of them I started another one: main problem was that my cpu goes 100% at each new thread start.

Googling around, I found the existence of BackgroundWorker and ThreadPool: stating that I want to update my GUI with the progress of each link that I'm downloading, what is the best solution?

1) Creating 4 different BackgroundWorker, attaching to each ProgressChanged event a Delegate to a function in my GUI to update the progress?

2) Use ThreadPool and setting max and min number of threads to the same value?

If I choose #2, when there are no more threads in the queue, does it stop the 4 working threads? Does it suspend them? Since I have to download different lists of links (20 links each of them) and move from one to another when one is completed, does the ThreadPool start and stop threads between each list?

If I want to change the number of working threads on live and decide to use ThreadPool, changing from 10 threads to 6, does it throw and exception and stop 4 random threads?

This is the only part that is giving me an headache. I thank each of you in advance for your answers.

Annapolis answered 2/8, 2011 at 14:13 Comment(1)
Why don't you use threads from the Threadpool? msdn.microsoft.com/en-us/library/3dasc8as%28v=vs.80%29.aspx#Y23Property
P
11

I would suggest using WebClient.DownloadFileAsync for this. You can have multiple downloads going, each raising the DownloadProgressChanged event as it goes along, and DownloadFileCompleted when done.

You can control the concurrency by using a queue with a semaphore or, if you're using .NET 4.0, a BlockingCollection. For example:

// Information used in callbacks.
class DownloadArgs
{
    public readonly string Url;
    public readonly string Filename;
    public readonly WebClient Client;
    public DownloadArgs(string u, string f, WebClient c)
    {
        Url = u;
        Filename = f;
        Client = c;
    }
}

const int MaxClients = 4;

// create a queue that allows the max items
BlockingCollection<WebClient> ClientQueue = new BlockingCollection<WebClient>(MaxClients);

// queue of urls to be downloaded (unbounded)
Queue<string> UrlQueue = new Queue<string>();

// create four WebClient instances and put them into the queue
for (int i = 0; i < MaxClients; ++i)
{
    var cli = new WebClient();
    cli.DownloadProgressChanged += DownloadProgressChanged;
    cli.DownloadFileCompleted += DownloadFileCompleted;
    ClientQueue.Add(cli);
}

// Fill the UrlQueue here

// Now go until the UrlQueue is empty
while (UrlQueue.Count > 0)
{
    WebClient cli = ClientQueue.Take(); // blocks if there is no client available
    string url = UrlQueue.Dequeue();
    string fname = CreateOutputFilename(url);  // or however you get the output file name
    cli.DownloadFileAsync(new Uri(url), fname, 
        new DownloadArgs(url, fname, cli));
}


void DownloadProgressChanged(object sender, DownloadProgressChangedEventArgs e)
{
    DownloadArgs args = (DownloadArgs)e.UserState;
    // Do status updates for this download
}

void DownloadFileCompleted(object sender, AsyncCompletedEventArgs e)
{
    DownloadArgs args = (DownloadArgs)e.UserState;
    // do whatever UI updates

    // now put this client back into the queue
    ClientQueue.Add(args.Client);
}

There's no need for explicitly managing threads or going to the TPL.

Puree answered 2/8, 2011 at 15:4 Comment(9)
I think line ClientQueue.Add(new WebClient()); is wrong and should be ClientQueue.Add(cli). Anyway, I think there are 2 problems with this method: 1) I have to specify the file name before downloading it, but I don't know its name beforehand. I usually take the name from the link either from the "Content-Disposition" of http header response. 2) First time I wrote my app, among the available choices, there was WebClient, but if I remember correctly it opens InternetExplorer in the background for each link! I still remember that pop-up that came from nowhere...Annapolis
I fixed the bug you identified. WebClient does not open IE. You might be thinking of the WebBrowser control. WebClient is a wrapper around HttpWebRequest and HttpWebResponse. If you need information from the headers, you can get them from the ResponseHeaders property. The above is just an example. Your requirements are easily met by making some minor changes.Puree
Whereas using explicit threads could work, it's incredibly wasteful to dedicate one thread for each download. And the TPL might not do a very good job. On a good connection, you can have a dozen or more concurrent downloads running, each of which will be an individual thread that spends most of its time waiting for data. Contrast that to using DownloadFileAsync, which will allocate only as many threads as are needed to handle the data as it's downloaded.Puree
If I wanted resume capability, how could I do? And also, ResponseHeaders property should be available after I run DownloadFileAsync, so I can't specify the name (maybe I can rename at its completion). What I'm more worried about is the method DownloadDataAsync: if I read correctly, it saves data in an internal array, but what happens if the file to download is of 1GB? (what I have to download rarely pass 1MB, most of time 200KB)Annapolis
If you want resume capability, then you'll have to create a derived class from WebClient and override the GetWebRequest method so that you can modify the request. Yes, you'd want to rename the file after completion. As far as DownloadDataAsync is concerned, it's going to fail if the downloaded file is too large to fit in memory.Puree
Last question: being an async method, in case DownloadFileAsync throws an exception, how and where should I catch it? It throws WebException and InvalidOperationException but I don't have any idea how to manage them. Thanks. Regards.Annapolis
Exceptions that happen before the actual download starts can be caught by putting a try/catch around the call to DownloadFileAsync. I don't know what happens to exceptions that occur while in the middle of downloading the file (i.e. a connection dropped, for example). I think the error is reported in the Error property of the AsyncCompletedEventArgs object that is passed to the DownloadFileCompleted event handler.Puree
I was also wondering, a bit unrelated but, if I use an HttpWebRequest and set some cookies and in the HttpWebResponse there is a Set-Cookie header, in CookieCollection got from the latter, do I have both list of cookies? Anyway, your suggestion about WebClient is the best one, I will choose you, but not now, I don't know if other answers can be added once I accept an answer.Annapolis
@DDB: Other answers can be added after you select one. And if you decide that another answer is better, you can change the selected answer.Puree
E
4

I think you should look into using the Task Parallel Library, which is new in .NET 4 and is designed for solving these types of problems

Emsmus answered 2/8, 2011 at 14:16 Comment(2)
Might I amend this with another solution along the same route? A backgroundworker with a parallel.foreach(urls, url=> {/*do action*/}); in it. -- its easier to read (like a foreach), and allows the logic to continue while the BGW is running.Venusberg
It seems that MaxDegreeOfParallelism allows me to set the max number of threads/tasks but not the min number. Not only this, but it seems that I can't change this value on live. Good suggestion though.Annapolis
R
0

Having 100% cpu load has nothing to do with the download (as your network is practically always the bottleneck). I would say you have to check your logic how you wait for the download to complete.

Can you post some code of the thread's code you start multiple times?

Rochdale answered 2/8, 2011 at 14:16 Comment(1)
It has nothing to do with the code per se, but the fact that creating a new thread does use cpu (I lied about the 100% cpu, it is more 40-50% only for the time [instant] to create the thread, the it goes to normal [I'm on an old Turion64bit 1.8GHz, single core, so I notice these abuses of cpu) and creating and destroying threads is a waste of cpu and ram since they can be reused: I'd like to know what the "best" solution would be.Annapolis
S
0

By creating 4 different backgroundworkers you will be creating seperate threads that will no longer interfere with your GUI. Backgroundworkers are simple to implement and from what I understand will do exactly what you need them to do.

Personally I would do this and simply allow the others to not start until the previous one is finished. (Or maybe just one, and allow it to execute one method at a time in the correct order.)

FYI - Backgroundworker

Soutache answered 2/8, 2011 at 14:24 Comment(1)
Using BackgroundWorker does not create a new process, but rather executes on a thread pool thread. He will not be creating separate processes, but rather separate threads.Puree

© 2022 - 2024 — McMap. All rights reserved.