How I can get web page's content and save it into the string variable

R

5

82

How I can get the content of the web page using ASP.NET? I need to write a program to get the HTML of a webpage and store it into a string variable.

Renshaw answered 22/12, 2010 at 14:32 Comment(0)

L

124

You can use the WebClient

Using System.Net;

using(WebClient client = new WebClient()) {
    string downloadString = client.DownloadString("http://www.gooogle.com");
}

Longeron answered 22/12, 2010 at 14:37 Comment(5)

Unfortunately DownloadString (as of .NET 3.5) is not smart enough to work with BOMs. I have included an alternative in my answer. – Dialectics 4/5, 2013 at 0:13

No up vote because no using(WebClient client = new WebClient()){} :) – Fredra 15/7, 2013 at 4:24

This is equivalent to Steven Spielberg's answer, posted 3 minutes before, so no +1. – Cartierbresson 10/5, 2015 at 20:52

Take note: as of .Net-7.0: SYSLIB0014: WebRequest.Create(string) is obsolete: WebRequest, HttpWebRequest, ServicePoint, and WebClient are obsolete. Use HttpClient instead.' – Ancestor 17/9, 2022 at 3:25

Indeed, WebClient is obsolete now and the accepted answer must be updated with HttpClient and its GetStringAsync method – Bark 24/2, 2023 at 11:9

C

74

I've run into issues with Webclient.Downloadstring before. If you do, you can try this:

WebRequest request = WebRequest.Create("http://www.google.com");
WebResponse response = request.GetResponse();
Stream data = response.GetResponseStream();
string html = String.Empty;
using (StreamReader sr = new StreamReader(data))
{
    html = sr.ReadToEnd();
}

Confident answered 22/12, 2010 at 14:42 Comment(7)

Can you elaborate on the problem you had? – Spoilsport 22/12, 2010 at 14:54

@Greg, it was a performance-related issue. I never really resolved it, but WebClient.DownloadString would take 5-10 seconds to pull down the HTML, where as WebRequest/WebResponse was almost immediate. Just wanted to propose another alternate solution in case the OP had similar issues or wanted a little more control over the request/response. – Confident 22/12, 2010 at 15:0

@Confident - +1 for finding this. Just run some tests. DownloadString took much longer on first use (5299ms downloadstring vs 200ms WebRequest). Tested it in a loop over 50 x BBC, 50 x CNN and 50 x Another RSS feed Urls, using different Urls to avoid caching. After initial load, DownloadString came out 20ms quicker for BBC, 300ms quicker on CNN. For the other RSS feed, WebRequest was 3ms quicker. Generally, I think I'll use WebRequest for singles and DownloadString for looping through URLs. – Jestinejesting 2/5, 2013 at 13:28

This worked perfectly for me, thanks! Just to maybe save others a little searching, WebRequest is in System.Net and Stream is in System.Io – Glomerulus 7/11, 2014 at 14:54

Scott, @HockeyJ - I don't know what changed since you used WebClient, but when I tested it (using .NET 4.5.2) it was fast enough - 950ms (still a bit slower than a single WebRequest which took 450 ms but not 5-10 seconds for sure). – Constriction 14/9, 2016 at 8:26

@Constriction That's still 2x slower. Not sure I'd double the time it takes to execute a web request just to save a couple of lines of boilerplate code. – Confident 14/9, 2016 at 18:2

Take note: as of .Net-7.0: SYSLIB0014: WebRequest.Create(string) is obsolete: WebRequest, HttpWebRequest, ServicePoint, and WebClient are obsolete. Use HttpClient instead.' – Ancestor 17/9, 2022 at 3:24

D

29

I recommend not using WebClient.DownloadString. This is because (at least in .NET 3.5) DownloadString is not smart enough to use/remove the BOM, should it be present. This can result in the BOM (ï»¿) incorrectly appearing as part of the string when UTF-8 data is returned (at least without a charset) - ick!

Instead, this slight variation will work correctly with BOMs:

string ReadTextFromUrl(string url) {
    // WebClient is still convenient
    // Assume UTF8, but detect BOM - could also honor response charset I suppose
    using (var client = new WebClient())
    using (var stream = client.OpenRead(url))
    using (var textReader = new StreamReader(stream, Encoding.UTF8, true)) {
        return textReader.ReadToEnd();
    }
}

Dialectics answered 4/5, 2013 at 0:12 Comment(2)

file a bug report – Reitareiter 18/11, 2019 at 20:29

Take note: as of .Net-7.0: SYSLIB0014: WebRequest.Create(string) is obsolete: WebRequest, HttpWebRequest, ServicePoint, and WebClient are obsolete. Use HttpClient instead.' – Ancestor 17/9, 2022 at 3:26

H

12

Webclient client = new Webclient();
string content = client.DownloadString(url);

Pass the URL of page who you want to get. You can parse the result using htmlagilitypack.

Harpsichord answered 22/12, 2010 at 14:34 Comment(1)

FYI: WebClient, and HttpGetRequest (amongst others) are obsolete. You have to use HttpClient now, and deal with all of the async overhead/baggage that it's encumbered with... 🙄🤦‍♀️😝 – Ancestor 17/9, 2022 at 3:22

N

6

I have always been using WebClient, but at the time this post is made (.NET 6 is avail), WebClient is getting deprecated.

The preferred way is

HttpClient client = new HttpClient();
string content = await client.GetStringAsync(url);

Northerly answered 20/7, 2022 at 3:58 Comment(4)

@Ancestor I'd suggest running the code block above in a C# Interactive window before declaring it useless or incomplete. I believe the OP is capable of using async code hence the code snippet is not wrapped an async function returning Task<string>. I have updated the 2nd line with variable assignment to make it complete. – Northerly 18/9, 2022 at 4:16

An example of an actually usable HttpClient implementation can be found here: #1048699 – Ancestor 8/10, 2022 at 6:50

@Ancestor here you go, just those 2 lines of code in action youtu.be/iZLJLK0HxgI – Northerly 9/10, 2022 at 18:1

my previous comment can be further generalized to any context where one needs to run async code in a sync function, wrap the async code with Task.Run and wait for the task to complete. I hope this is sufficient to conclude this discussion – Northerly 9/10, 2022 at 23:47

Recommended topics

Hot tags