How I can get web page's content and save it into the string variable
Asked Answered
R

5

82

How I can get the content of the web page using ASP.NET? I need to write a program to get the HTML of a webpage and store it into a string variable.

Renshaw answered 22/12, 2010 at 14:32 Comment(0)
L
124

You can use the WebClient

Using System.Net;

using(WebClient client = new WebClient()) {
    string downloadString = client.DownloadString("http://www.gooogle.com");
}
Longeron answered 22/12, 2010 at 14:37 Comment(5)
Unfortunately DownloadString (as of .NET 3.5) is not smart enough to work with BOMs. I have included an alternative in my answer.Dialectics
No up vote because no using(WebClient client = new WebClient()){} :)Fredra
This is equivalent to Steven Spielberg's answer, posted 3 minutes before, so no +1.Cartierbresson
Take note: as of .Net-7.0: SYSLIB0014: WebRequest.Create(string) is obsolete: WebRequest, HttpWebRequest, ServicePoint, and WebClient are obsolete. Use HttpClient instead.'Ancestor
Indeed, WebClient is obsolete now and the accepted answer must be updated with HttpClient and its GetStringAsync methodBark
C
74

I've run into issues with Webclient.Downloadstring before. If you do, you can try this:

WebRequest request = WebRequest.Create("http://www.google.com");
WebResponse response = request.GetResponse();
Stream data = response.GetResponseStream();
string html = String.Empty;
using (StreamReader sr = new StreamReader(data))
{
    html = sr.ReadToEnd();
}
Confident answered 22/12, 2010 at 14:42 Comment(7)
Can you elaborate on the problem you had?Spoilsport
@Greg, it was a performance-related issue. I never really resolved it, but WebClient.DownloadString would take 5-10 seconds to pull down the HTML, where as WebRequest/WebResponse was almost immediate. Just wanted to propose another alternate solution in case the OP had similar issues or wanted a little more control over the request/response.Confident
@Confident - +1 for finding this. Just run some tests. DownloadString took much longer on first use (5299ms downloadstring vs 200ms WebRequest). Tested it in a loop over 50 x BBC, 50 x CNN and 50 x Another RSS feed Urls, using different Urls to avoid caching. After initial load, DownloadString came out 20ms quicker for BBC, 300ms quicker on CNN. For the other RSS feed, WebRequest was 3ms quicker. Generally, I think I'll use WebRequest for singles and DownloadString for looping through URLs.Jestinejesting
This worked perfectly for me, thanks! Just to maybe save others a little searching, WebRequest is in System.Net and Stream is in System.IoGlomerulus
Scott, @HockeyJ - I don't know what changed since you used WebClient, but when I tested it (using .NET 4.5.2) it was fast enough - 950ms (still a bit slower than a single WebRequest which took 450 ms but not 5-10 seconds for sure).Constriction
@Constriction That's still 2x slower. Not sure I'd double the time it takes to execute a web request just to save a couple of lines of boilerplate code.Confident
Take note: as of .Net-7.0: SYSLIB0014: WebRequest.Create(string) is obsolete: WebRequest, HttpWebRequest, ServicePoint, and WebClient are obsolete. Use HttpClient instead.'Ancestor
D
29

I recommend not using WebClient.DownloadString. This is because (at least in .NET 3.5) DownloadString is not smart enough to use/remove the BOM, should it be present. This can result in the BOM () incorrectly appearing as part of the string when UTF-8 data is returned (at least without a charset) - ick!

Instead, this slight variation will work correctly with BOMs:

string ReadTextFromUrl(string url) {
    // WebClient is still convenient
    // Assume UTF8, but detect BOM - could also honor response charset I suppose
    using (var client = new WebClient())
    using (var stream = client.OpenRead(url))
    using (var textReader = new StreamReader(stream, Encoding.UTF8, true)) {
        return textReader.ReadToEnd();
    }
}
Dialectics answered 4/5, 2013 at 0:12 Comment(2)
file a bug reportReitareiter
Take note: as of .Net-7.0: SYSLIB0014: WebRequest.Create(string) is obsolete: WebRequest, HttpWebRequest, ServicePoint, and WebClient are obsolete. Use HttpClient instead.'Ancestor
H
12
Webclient client = new Webclient();
string content = client.DownloadString(url);

Pass the URL of page who you want to get. You can parse the result using htmlagilitypack.

Harpsichord answered 22/12, 2010 at 14:34 Comment(1)
FYI: WebClient, and HttpGetRequest (amongst others) are obsolete. You have to use HttpClient now, and deal with all of the async overhead/baggage that it's encumbered with... 🙄🤦‍♀️😝Ancestor
N
6

I have always been using WebClient, but at the time this post is made (.NET 6 is avail), WebClient is getting deprecated.

The preferred way is

HttpClient client = new HttpClient();
string content = await client.GetStringAsync(url);
Northerly answered 20/7, 2022 at 3:58 Comment(4)
@Ancestor I'd suggest running the code block above in a C# Interactive window before declaring it useless or incomplete. I believe the OP is capable of using async code hence the code snippet is not wrapped an async function returning Task<string>. I have updated the 2nd line with variable assignment to make it complete.Northerly
An example of an actually usable HttpClient implementation can be found here: #1048699Ancestor
@Ancestor here you go, just those 2 lines of code in action youtu.be/iZLJLK0HxgINortherly
my previous comment can be further generalized to any context where one needs to run async code in a sync function, wrap the async code with Task.Run and wait for the task to complete. I hope this is sufficient to conclude this discussionNortherly

© 2022 - 2024 — McMap. All rights reserved.