Can I directly stream from HttpResponseMessage to file without going through memory?

Asked 19/8, 2016 at 7:58 Answered 11/8, 2022 at 7:38

My program uses HttpClient to send a GET request to a Web API, and this returns a file.

I now use this code (simplified) to store the file to disc:

public async Task<bool> DownloadFile()
{
    var client = new HttpClient();
    var uri = new Uri("http://somedomain.com/path");
    var response = await client.GetAsync(uri);

    if (response.IsSuccessStatusCode)
    {
        var fileName = response.Content.Headers.ContentDisposition.FileName;
        using (var fs = new FileStream(@"C:\test\" + fileName, FileMode.Create, FileAccess.Write, FileShare.None))
        {
            await response.Content.CopyToAsync(fs);
            return true;
        }
    }

    return false;
}

Now, when this code runs, the process loads all of the file into memory. I actually would rather expect the stream gets streamed from the HttpResponseMessage.Content to the FileStream, so that only a small portion of it is held in memory.

We are planning to use that on large files (> 1GB), so is there a way to achieve that without having all of the file in memory?

Ideally without manually looping through reading a portion to a byte[] and writing that portion to the file stream until all of the content is written?

Giddens answered 19/8, 2016 at 7:58 Comment(6)

look at - msdn.microsoft.com/en-us/library/… – Scurry 19/8, 2016 at 8:3

CopyToAsync is already doing what you describe (internally it repeatedly reads a chunk of data from the response and writes it to the file until all data is transferred) it should not result in buffering the entire file to memory at once. – Endophyte 19/8, 2016 at 8:3

Thats what I thought too, however looking at the memory consumption it definetly does load the whole file in memory, which I want to avoid. – Giddens 19/8, 2016 at 8:6

Have you tried response.Content.ReadAsStreamAsync() and use Stream.CopyToAsync? – Perceptive 19/8, 2016 at 8:6

Then it may be that your measurement of "memory consumption" is inaccurate. The runtime will not necessarily release memory that is "free" unless it needs to, so the memory usage values in e.g. Task Manager may not reflect the memory actually "in use" by the application. – Endophyte 19/8, 2016 at 8:10

That is also likley not the case, as when I debug its terminated by an out of memory exception very soon. – Giddens 19/8, 2016 at 8:32

It looks like this is by-design - if you check the documentation for HttpClient.GetAsync() you'll see it says:

The returned task object will complete after the whole response (including content) is read

You can instead use HttpClient.GetStreamAsync() which specifically states:

This method does not buffer the stream.

However you don't then get access to the headers in the response as far as I can see. Since that's presumably a requirement (as you're getting the file name from the headers), then you may want to use HttpWebRequest instead which allows you you to get the response details (headers etc.) without reading the whole response into memory. Something like:

public async Task<bool> DownloadFile()
{
    var uri = new Uri("http://somedomain.com/path");
    var request = WebRequest.CreateHttp(uri);
    var response = await request.GetResponseAsync();

    ContentDispositionHeaderValue contentDisposition;
    var fileName = ContentDispositionHeaderValue.TryParse(response.Headers["Content-Disposition"], out contentDisposition)
        ? contentDisposition.FileName
        : "noname.dat";
    using (var fs = new FileStream(@"C:\test\" + fileName, FileMode.Create, FileAccess.Write, FileShare.None))
    {
        await response.GetResponseStream().CopyToAsync(fs);
    }

    return true
}

Note that if the request returns an unsuccessful response code an exception will be thrown, so you may wish to wrap in a try..catch and return false in this case as in your original example.

Endophyte answered 19/8, 2016 at 11:38 Comment(1)

Thanks for the clarification. Indeed, using GetStreamAsync on the client allows saving the stream directly to disk and not to memory. Since we have control over the server too, I can simply fetch the metadata (file name, size) in a separate request first. Then I don't need to access the headers. – Giddens 19/8, 2016 at 11:47

Instead of GetAsync(Uri) use the the GetAsync(Uri, HttpCompletionOption) overload with the HttpCompletionOption.ResponseHeadersRead value.

The same applies to SendAsync and other methods of HttpClient

Sources:

docs (see remarks) https://learn.microsoft.com/en-us/dotnet/api/system.net.http.httpclient.getasync?view=netcore-1.1#System_Net_Http_HttpClient_GetAsync_System_Uri_System_Net_Http_HttpCompletionOption_

The returned Task object will complete based on the completionOption parameter after the part or all of the response (including content) is read.

.NET Core implementation of GetStreamAsync that uses HttpCompletionOption.ResponseHeadersRead https://github.com/dotnet/corefx/blob/release/1.1.0/src/System.Net.Http/src/System/Net/Http/HttpClient.cs#L163-L168
HttpClient spike in memory usage with large response
HttpClient.GetStreamAsync() with custom request? (don't mind the comment on response, the ResponseHeadersRead is what does the trick)

Lublin answered 25/12, 2019 at 23:58 Comment(0)

-2

Another simple and quick way to do it is:

public async Task<bool> DownloadFile(string url)
{
    using (MemoryStream ms = new MemoryStream()) {
       new HttpClient().GetStreamAsync(webPath).Result.CopyTo(ms);
       
       ... // use ms in what you want 
    }
}

now you have the file downloaded as stream in ms.

Restrict answered 11/8, 2022 at 7:38 Comment(1)

OP specifically asked for a way to circumvent streaming the entire file to memory. – Dundalk 15/2, 2023 at 9:9

Recommended topics

Hot tags