Get HTML source code from CefSharp web browser
Asked Answered
B

3

23

I am using aCefSharp.Wpf.ChromiumWebBrowser (Version 47.0.3.0) to load a web page. Some point after the page has loaded I want to get the source code.

I have called:

wb.GetBrowser().MainFrame.GetSourceAsync()

however it does not appear to be returning all the source code (I believe this is because there are child frames).

If I call:

wb.GetBrowser().MainFrame.ViewSource() 

I can see it lists all the source code (including the inner frames).

I would like to get the same result as ViewSource(). Could some one point me in the right direction please?

Update – Added Code example

Note: The address the web browser is pointing too will only work up to and including 10/03/2016. After that it may display different data which is not what I would be looking at.

In the frmSelection.xaml file

<cefSharp:ChromiumWebBrowser Name="wb" Grid.Column="1" Grid.Row="0" />

In the frmSelection.xaml.cs file

public partial class frmSelection : UserControl
{
    private System.Windows.Threading.DispatcherTimer wbTimer = new System.Windows.Threading.DispatcherTimer();

    public frmSelection()
    {

         InitializeComponent();

         // This timer will start when a web page has been loaded.
         // It will wait 4 seconds and then call wbTimer_Tick which 
         // will then see if data can be extracted from the web page.
         wbTimer.Interval = new TimeSpan(0, 0, 4);
         wbTimer.Tick += new EventHandler(wbTimer_Tick);

         wb.Address = "http://www.racingpost.com/horses2/cards/card.sd?race_id=644222&r_date=2016-03-10#raceTabs=sc_";

         wb.FrameLoadEnd += new EventHandler<CefSharp.FrameLoadEndEventArgs>(wb_FrameLoadEnd);

    }

        void wb_FrameLoadEnd(object sender, CefSharp.FrameLoadEndEventArgs e)
        {
            if (wbTimer.IsEnabled)
                wbTimer.Stop();

            wbTimer.Start();
        }

    void wbTimer_Tick(object sender, EventArgs e)
    {
        wbTimer.Stop();
        string html = GetHTMLFromWebBrowser();
    }

    private string GetHTMLFromWebBrowser()
    {
         // call the ViewSource method which will open up notepad and display the html.
         // this is just so I can compare it to the html returned in GetSourceAsync()
         // This is displaying all the html code (including child frames)
            wb.GetBrowser().MainFrame.ViewSource();

         // Get the html source code from the main Frame.
            // This is displaying only code in the main frame and not any child frames of it.
            Task<String> taskHtml = wb.GetBrowser().MainFrame.GetSourceAsync();

            string response = taskHtml.Result;
     return response;
  }

}
Ballata answered 9/3, 2016 at 11:29 Comment(2)
Can you share some more code? I can't reproduce your problem, I get the same text with GetSourceAsync as with ViewSource. Tried it with Address set to http://stackoverflow.com (it has two frames, one iframe and the main frame)Beora
Thanks for taking a look. I have added example source to the original post.Ballata
L
34

I don't think I quite get this DispatcherTimer solution. I would do it like this:

public frmSelection()
{
    InitializeComponent();

    wb.FrameLoadEnd += WebBrowserFrameLoadEnded;
    wb.Address = "http://www.racingpost.com/horses2/cards/card.sd?race_id=644222&r_date=2016-03-10#raceTabs=sc_";
}

private void WebBrowserFrameLoadEnded(object sender, FrameLoadEndEventArgs e)
{
    if (e.Frame.IsMain)
    {
        wb.ViewSource();
        wb.GetSourceAsync().ContinueWith(taskHtml =>
        {
            var html = taskHtml.Result;
        });
    }
}

I did a diff on the output of ViewSource and the text in the html variable and they are the same, so I can't reproduce your problem here.

This said, I noticed that the main frame gets loaded pretty late, so you have to wait quite a while until the notepad pops up with the source.

Lum answered 10/3, 2016 at 19:28 Comment(1)
Thank you for the feedback on my code, I have sine updated it to reflect your example. I have run the code on another computer since posting the example and I get the same results as you (both return the full source code). I can only conclude there is something weird going on with my machine and I will consider doing a format.Ballata
M
2

I was having the same issue trying to get click on and item located in a frame and not on the main frame. Using the example in your answer, I wrote the following extension method:

    public static IFrame GetFrame(this ChromiumWebBrowser browser, string FrameName)
    {
        IFrame frame = null;

        var identifiers = browser.GetBrowser().GetFrameIdentifiers();

        foreach (var i in identifiers)
        {
            frame = browser.GetBrowser().GetFrame(i);
            if (frame.Name == FrameName)
                return frame;
        }

        return null;
    }

If you have a "using" on your form for the module that contains this method you can do something like:

var frame = browser.GetFrame("nameofframe");
if (frame != null)
{
    string HTML = await frame.GetSourceAsync();
}

Of course you need to make sure the page load is complete before using this, but I plan to use it a lot. Hope it helps!

Molloy answered 28/3, 2016 at 14:12 Comment(0)
M
0
private void button1_Click(object sender, EventArgs e)
{
    Task<String> taskHtml = CW.GetBrowser().MainFrame.GetSourceAsync();
    textBox1.Text = taskHtml.Result;
}
Marola answered 11/10, 2023 at 21:52 Comment(1)
While this code may solve the question, including an explanation of how and why this solves the problem would really help to improve the quality of your post, and probably result in more up-votes. Remember that you are answering the question for readers in the future, not just the person asking now. Please edit your answer to add explanations and give an indication of what limitations and assumptions applyRackley

© 2022 - 2024 — McMap. All rights reserved.