Download js generated html with C#
Asked Answered
W

4

8

There is a reports website which content I want to parse in C#. I tried downloading the html with WebClient but then I don't get the complete source since most of it is generated via js when I visit the website.

I tried using WebBrowser but could't get it to work in a console app, even after using Application.Run() and SetApartmentState(ApartmentState.STA).

Is there another way to access this generated html? I also took a look into mshtml but couldn't figure it out.

Thanks

Wallow answered 23/1, 2012 at 22:26 Comment(0)
R
3

The Javascript is executed by the browser. If your console app gets the JS, then it is working as expected, and what you really need is for your console app to execute the JS code that was downloaded.

Renaldorenard answered 23/1, 2012 at 23:41 Comment(1)
I ended up with this, but it was a hassle to implement it. ThanksWallow
R
3

You can use a headless browser - XBrowser may server.

If not, try HtmlUnit as described in this blog post.

Richburg answered 23/1, 2012 at 22:29 Comment(2)
Forgot to mention, I can't use any external libraries. Otherwise this would have been great. ThanksWallow
@Wallow - Then WebBrowser is your only choice. #5519794Richburg
R
3

The Javascript is executed by the browser. If your console app gets the JS, then it is working as expected, and what you really need is for your console app to execute the JS code that was downloaded.

Renaldorenard answered 23/1, 2012 at 23:41 Comment(1)
I ended up with this, but it was a hassle to implement it. ThanksWallow
W
0

Just a comment here. There shouldn't be any difference between performing an HTTP request with some C# code and the request generated by a browser. If the target web page is getting confused and not generating the correct markup because it can't make heads or tails of from the type of browser it thinks it's serving then maybe all you have to do is set the user agent like so:

((HttpWebRequest)myWebClientRequest).UserAgent = "<a valid user agent>";

For example, my current user agent is:

Mozilla/5.0 (Windows NT 6.1; WOW64; rv:9.0.1) Gecko/20100101 Firefox/9.0.1

Maybe once you do that the page will work correctly. There may be other factors at work here, such as the referrer and so on, but I would try this first and see if it works.

Warthog answered 23/1, 2012 at 22:37 Comment(1)
The reason he isn't getting what he is expecting is because of JavaScript that executes on the site. An HttpWebRequest won't execute JavaScript. He is on the right track with WebBrowser.Boabdil
B
0

Your best bet is to abandon the console app route and build a Windows Forms application. In that case the WebBrowser will work without any work needed.

Boabdil answered 23/1, 2012 at 23:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.