Get ReadyState from WebBrowser control without DoEvents
Asked Answered
V

3

4

This has been awnsered many times here and at other sites and its working, but I would like ideas to other ways to:

get the ReadyState = Complete after using a navigate or post, without using DoEvents because of all of its cons.

I would also note that using the DocumentComplete event woud not help here as I wont be navigating on only one page, but one after another like this.

wb.navigate("www.microsoft.com")
//dont use DoEvents loop here
wb.Document.Body.SetAttribute(textbox1, "login")
//dont use DoEvents loop here
if (wb.documenttext.contais("text"))
//do something

The way it is today its working by using DoEvents. I would like to know if anyone have a proper way to wait the async call of the browser methods to only then proceed with the rest of the logic. Just for the sake of it.

Thanks in advance.

Vidette answered 15/1, 2014 at 14:40 Comment(1)
You must use the DocumentCompleted event. All you need to do is keep track of what was completed. The event already tells you, you get the e.Url property back. If you need to know more then just use a variable that keeps track of the state. A simple integer or enum will do fine.Clinkerbuilt
S
2

Below is a basic WinForms app code, illustrating how to wait for the DocumentCompleted event asynchronously, using async/await. It navigates to multiple pages, one after another. Everything is taking place on the main UI thread.

Instead of calling this.webBrowser.Navigate(url), it might be simulating a form button click, to trigger a POST-style navigation.

The webBrowser.IsBusy async loop logic is optional, its purpose is to account (non-deterministically) for the page's dynamic AJAX code which may take place after window.onload event.

using System;
using System.Diagnostics;
using System.Threading;
using System.Threading.Tasks;
using System.Windows.Forms;

namespace WebBrowserApp
{
    public partial class MainForm : Form
    {
        WebBrowser webBrowser;

        public MainForm()
        {
            InitializeComponent();

            // create a WebBrowser
            this.webBrowser = new WebBrowser();
            this.webBrowser.Dock = DockStyle.Fill;
            this.Controls.Add(this.webBrowser);

            this.Load += MainForm_Load;
        }

        // Form Load event handler
        async void MainForm_Load(object sender, EventArgs e)
        {
            // cancel the whole operation in 30 sec
            var cts = new CancellationTokenSource(30000);

            var urls = new String[] { 
                    "http://www.example.com", 
                    "http://www.gnu.org", 
                    "http://www.debian.org" };

            await NavigateInLoopAsync(urls, cts.Token);
        }

        // navigate to each URL in a loop
        async Task NavigateInLoopAsync(string[] urls, CancellationToken ct)
        {
            foreach (var url in urls)
            {
                ct.ThrowIfCancellationRequested();
                var html = await NavigateAsync(ct, () => 
                    this.webBrowser.Navigate(url));
                Debug.Print("url: {0}, html: \n{1}", url, html);
            }
        }

        // asynchronous navigation
        async Task<string> NavigateAsync(CancellationToken ct, Action startNavigation)
        {
            var onloadTcs = new TaskCompletionSource<bool>();
            EventHandler onloadEventHandler = null;

            WebBrowserDocumentCompletedEventHandler documentCompletedHandler = delegate
            {
                // DocumentCompleted may be called several time for the same page,
                // if the page has frames
                if (onloadEventHandler != null)
                    return;

                // so, observe DOM onload event to make sure the document is fully loaded
                onloadEventHandler = (s, e) =>
                    onloadTcs.TrySetResult(true);
                this.webBrowser.Document.Window.AttachEventHandler("onload", onloadEventHandler);
            };

            this.webBrowser.DocumentCompleted += documentCompletedHandler;
            try
            {
                using (ct.Register(() => onloadTcs.TrySetCanceled(), useSynchronizationContext: true))
                {
                    startNavigation();
                    // wait for DOM onload event, throw if cancelled
                    await onloadTcs.Task;
                }
            }
            finally
            {
                this.webBrowser.DocumentCompleted -= documentCompletedHandler;
                if (onloadEventHandler != null)
                    this.webBrowser.Document.Window.DetachEventHandler("onload", onloadEventHandler);
            }

            // the page has fully loaded by now

            // optional: let the page run its dynamic AJAX code,
            // we might add another timeout for this loop
            do { await Task.Delay(500, ct); }
            while (this.webBrowser.IsBusy);

            // return the page's HTML content
            return this.webBrowser.Document.GetElementsByTagName("html")[0].OuterHtml;
        }
    }
}

If you're looking to do something similar from a console app, here is an example of that.

Speed answered 16/1, 2014 at 3:45 Comment(1)
Why did you write Visual Basic .NET in C syntax?Aden
B
2

The solution is simple:

    // MAKE SURE ReadyState = Complete
            while (WebBrowser1.ReadyState.ToString() != "Complete") {
                Application.DoEvents();         
            }

// Move on to your sub-sequence code...


Dirty and quick.. I am a VBA guys, this logic has been working forever, just took me days and found none for C# but I just figured this out myself.

Following is my complete function, the objective is to obtain a segment of info from a webpage:

private int maxReloadAttempt = 3;
    private int currentAttempt = 1;

    private string GetCarrier(string webAddress)
    {
        WebBrowser WebBrowser_4MobileCarrier = new WebBrowser();
        string innerHtml;
        string strStartSearchFor = "subtitle block pull-left\">";
        string strEndSearchFor = "<";

        try
        {
            WebBrowser_4MobileCarrier.ScriptErrorsSuppressed = true;
            WebBrowser_4MobileCarrier.Navigate(webAddress); 

            // MAKE SURE ReadyState = Complete
            while (WebBrowser_4MobileCarrier.ReadyState.ToString() != "Complete") {
                Application.DoEvents();         
            }

            // LOAD HTML
            innerHtml = WebBrowser_4MobileCarrier.Document.Body.InnerHtml;  

            // ATTEMPT (x3) TO EXTRACT CARRIER STRING
            while (currentAttempt <=  maxReloadAttempt) {
                if (innerHtml.IndexOf(strStartSearchFor) >= 0)
                {
                    currentAttempt = 1; // Reset attempt counter
                    return Sub_String(innerHtml, strStartSearchFor, strEndSearchFor, "0"); // Method: "Sub_String" is my custom function
                }
                else
                {
                    currentAttempt += 1;    // Increment attempt counter
                    GetCarrier(webAddress); // Recursive method call
                } // End if
            } // End while
        }   // End Try

        catch //(Exception ex)
        {
        }
        return "Unavailable";
    }
Bowlds answered 23/7, 2014 at 13:38 Comment(0)
D
1

Here is a "quick & dirty" solution. It's not 100% foolproof but it doesn't block UI thread and it should be satisfactory to prototype WebBrowser control Automation procedures:

    private async void testButton_Click(object sender, EventArgs e)
    {
        await Task.Factory.StartNew(
            () =>
            {
                stepTheWeb(() => wb.Navigate("www.yahoo.com"));
                stepTheWeb(() => wb.Navigate("www.microsoft.com"));
                stepTheWeb(() => wb.Navigate("asp.net"));
                stepTheWeb(() => wb.Document.InvokeScript("eval", new[] { "$('p').css('background-color','yellow')" }));
                bool testFlag = false;
                stepTheWeb(() => testFlag = wb.DocumentText.Contains("Get Started"));
                if (testFlag) {    /* TODO */ }
                // ... 
            }
        );
    }

    private void stepTheWeb(Action task)
    {
        this.Invoke(new Action(task));

        WebBrowserReadyState rs = WebBrowserReadyState.Interactive;
        while (rs != WebBrowserReadyState.Complete)
        {
            this.Invoke(new Action(() => rs = wb.ReadyState));
            System.Threading.Thread.Sleep(300);
        }
   }

Here is a bit more generic version of testButton_Click method:

    private async void testButton_Click(object sender, EventArgs e)
    {
        var actions = new List<Action>()
            {
                () => wb.Navigate("www.yahoo.com"),
                () => wb.Navigate("www.microsoft.com"),
                () => wb.Navigate("asp.net"),
                () => wb.Document.InvokeScript("eval", new[] { "$('p').css('background-color','yellow')" }),
                () => {
                         bool testFlag = false;
                         testFlag  = wb.DocumentText.Contains("Get Started"); 
                         if (testFlag)  {   /*  TODO */  }
                       }
                //... 
            };

        await Task.Factory.StartNew(() => actions.ForEach((x)=> stepTheWeb (x)));  
    }

[Update]

I have adapted my "quick & dirty" sample by borrowing and sligthly refactoring @Noseratio's NavigateAsync method from this topic. New code version would automate/execute asynchronously in UI thread context not only navigation operations but also Javascript/AJAX calls - any "lamdas"/one automation step task implementation methods.

All and every code reviews/comments are very welcome. Especially, from @Noseratio. Together, we will make this world better ;)

    public enum ActionTypeEnumeration
    {
        Navigation = 1,
        Javascript = 2,
        UIThreadDependent = 3,
        UNDEFINED = 99
    }

    public class ActionDescriptor
    {
        public Action Action { get; set; }
        public ActionTypeEnumeration ActionType { get; set; }
    }

    /// <summary>
    /// Executes a set of WebBrowser control's Automation actions
    /// </summary>
    /// <remarks>
    ///  Test form shoudl ahve the following controls:
    ///    webBrowser1 - WebBrowser,
    ///    testbutton - Button,
    ///    testCheckBox - CheckBox,
    ///    totalHtmlLengthTextBox - TextBox
    /// </remarks> 
    private async void testButton_Click(object sender, EventArgs e)
    {
        try
        {
            var cts = new CancellationTokenSource(60000);

            var actions = new List<ActionDescriptor>()
            {
                new ActionDescriptor() { Action = ()=>  wb.Navigate("www.yahoo.com"), ActionType = ActionTypeEnumeration.Navigation}  ,
                new ActionDescriptor() { Action = () => wb.Navigate("www.microsoft.com"), ActionType = ActionTypeEnumeration.Navigation}  ,
                new ActionDescriptor() { Action = () => wb.Navigate("asp.net"), ActionType = ActionTypeEnumeration.Navigation}  ,
                new ActionDescriptor() { Action = () => wb.Document.InvokeScript("eval", new[] { "$('p').css('background-color','yellow')" }), ActionType = ActionTypeEnumeration.Javascript}, 
                new ActionDescriptor() { Action =
                () => {
                         testCheckBox.Checked = wb.DocumentText.Contains("Get Started"); 
                       },
                       ActionType = ActionTypeEnumeration.UIThreadDependent} 
                //... 
            };

            foreach (var action in actions)
            {
               string html = await ExecuteWebBrowserAutomationAction(cts.Token, action.Action, action.ActionType);
               // count HTML web page stats - just for fun
               int totalLength = 0;
               Int32.TryParse(totalHtmlLengthTextBox.Text, out totalLength);
               totalLength += !string.IsNullOrWhiteSpace(html) ? html.Length : 0;
               totalHtmlLengthTextBox.Text = totalLength.ToString();   
            }
        }
        catch (Exception ex)
        {
            MessageBox.Show(ex.Message, "Error");   
        }
    }

    // asynchronous WebBroswer control Automation
    async Task<string> ExecuteWebBrowserAutomationAction(
                            CancellationToken ct, 
                            Action runWebBrowserAutomationAction, 
                            ActionTypeEnumeration actionType = ActionTypeEnumeration.UNDEFINED)
    {
        var onloadTcs = new TaskCompletionSource<bool>();
        EventHandler onloadEventHandler = null;

        WebBrowserDocumentCompletedEventHandler documentCompletedHandler = delegate
        {
            // DocumentCompleted may be called several times for the same page,
            // if the page has frames
            if (onloadEventHandler != null)
                return;

            // so, observe DOM onload event to make sure the document is fully loaded
            onloadEventHandler = (s, e) =>
                onloadTcs.TrySetResult(true);
            this.wb.Document.Window.AttachEventHandler("onload", onloadEventHandler);
        };


        this.wb.DocumentCompleted += documentCompletedHandler;
        try
        {
            using (ct.Register(() => onloadTcs.TrySetCanceled(), useSynchronizationContext: true))
            {
                runWebBrowserAutomationAction();

                if (actionType == ActionTypeEnumeration.Navigation)
                {
                    // wait for DOM onload event, throw if cancelled
                    await onloadTcs.Task;
                }
            }
        }
        finally
        {
            this.wb.DocumentCompleted -= documentCompletedHandler;
            if (onloadEventHandler != null)
                this.wb.Document.Window.DetachEventHandler("onload", onloadEventHandler);
        }

        // the page has fully loaded by now

        // optional: let the page run its dynamic AJAX code,
        // we might add another timeout for this loop
        do { await Task.Delay(500, ct); }
        while (this.wb.IsBusy);

        // return the page's HTML content
        return this.wb.Document.GetElementsByTagName("html")[0].OuterHtml;
    }
Distinguishing answered 16/1, 2014 at 0:7 Comment(8)
No offence, but this is a bad design. It uses a background thread only to manipulate the WebBrowser object on the UI thread via Control.Invoke. This task doesn't need an extra thread. And the Thread.Sleep(300) loop ... there's DocumentCompleted event for that.Speed
@Noseratio, Thank you, I know :) And so I have noted it's a "quick & dirty" solution to prototype WebBrowser control's Automation procedures. I do use DocumentCompleted for real life projects. Obviously var actions ... "control structure" can be generalized and my private void stepTheWeb(Action task) can be refactored to use DocumentCompleted and other tricks to handle not only WebBrowser control navigation but also Javascript/AJAX actions/calls. As for extra thread - this "quick & dirty" solution would hang UI thread without extra thread, wouldn't it?Distinguishing
@Noseratio, I have just posted here a new code version by borrowing and adapting part of your code sample from this topic, would that be OK with you?Distinguishing
I can't spot any difference between your ExecuteWebBrowserAutomationAction and my NavigateAsync. While this code itself is nothing special, borrowing the key part from a neighboring answer isn't something I remember seeing often on SO.Speed
@Noseratio, This topic's original question was not only about WebBrowser control Automation of URLs navigation but also has a request to execute code lines as wb.Document.Body.SetAttribute(textbox1, "login"). Your current version of NavigateAsync throws run-time error for that code line. I have made it working with a few corrections just to show the difference. Feel free to "borrow back" corrected code, to make it even more solid - and I will drop the part of my answer, which has your NavigateAsync borrowed if you're seeing any problems with that.Distinguishing
@Noseratio, AFAIS, SO's "user contributions are licensed under cc by-sa 3.0 with attribution required". My borrowing of your code, its correction and republishing here in corrected form with all the credits to your original code doesn't violate any SO copyright rules I suppose. Thank you.Distinguishing
My version NavigateAsync doesn't call b.Document.Body.SetAttribute(textbox1, "login") at all. The code I posted is a small but working standalone app. The purpose of it is to illustrate the concept without nitpicking.Speed
@Noseratio, Yes, I believe it makes sense, this "offline pair-programming ping-pong", as the current code version has a working solution for all the requirements of original question, not only demonstrates concept.Distinguishing

© 2022 - 2024 — McMap. All rights reserved.