How can I get HTML source code from TWebBrowser
Asked Answered
E

4

14

How can I get source code from WebBrowser component?

I want to get source code of active page on WebBrowser component and write it to a Memo component.

Thanks.

Ehr answered 10/4, 2012 at 15:27 Comment(0)
K
22

You can use the IPersistStreamInit Interface and the save method to store the content of the Webbrowser in a Stream.

Uses 
  ActiveX;

function GetWebBrowserHTML(const WebBrowser: TWebBrowser): String;
var
  LStream: TStringStream;
  Stream : IStream;
  LPersistStreamInit : IPersistStreamInit;
begin
  if not Assigned(WebBrowser.Document) then exit;
  LStream := TStringStream.Create('');
  try
    LPersistStreamInit := WebBrowser.Document as IPersistStreamInit;
    Stream := TStreamAdapter.Create(LStream,soReference);
    LPersistStreamInit.Save(Stream,true);
    result := LStream.DataString;
  finally
    LStream.Free();
  end;
end;
Kingship answered 10/4, 2012 at 15:40 Comment(3)
How can we make it work the REVERSE way: SetWebBrowserHTML, thus re-injecting the previously extracted code back to WebBrowser (or TEmbeddedWebBrowser). I imagine the following situation: A memo component gets the HTML source code with GetWebBrowserHTML, then the user makes some changes to the source code, then the changed source code is re-injected back into WebBrowser. This would make a nice HTML editor with real-time preview in the browser!Sentinel
Better: LStream := TStringStream.Create('', TEncoding.UTF8);Sentinel
@user1580348If you wanted to "reverse" it, all you need to change is LPersistStreamInit.Save to LPersistStreamInit.Load and initialize the TStringStream with something (or pass in a different stream).Grantgranta
C
7

That works well too:

    uses MSHTML;

    function GetHTML(w: TWebBrowser): String;
    Var
      e: IHTMLElement;
    begin
      Result := '';
      if Assigned(w.Document) then
      begin
         e := (w.Document as IHTMLDocument2).body;
    
         while e.parentElement <> nil do
         begin
           e := e.parentElement;
         end;
    
         Result := e.outerHTML;
      end;
    end;
Chaisson answered 18/3, 2013 at 19:25 Comment(4)
Wrong. this will get you the DOM representation of the document element. It will not be the HTML source code.Fates
Yes you are right, I was using it just to parse some data available on html source and using DOM representation was ok for that.Chaisson
I'll upvote your answer, It's useful in any case. I also use a similar method in our spider to manipulate/parse HTML from a foreign web site.Fates
I had to up vote this because the page I was trying to get source code had the content changed by JavaScript, so @rruz suggestion didn't work as it returned the original HTML instead of the changed one. Thank you.Viscus
T
3

This has been asked and answered many times in the Embarcadero forums, with plenty of code examples posted. Search the archives.

The gist of it is that you Navigate() to the desired URL and wait for the OnDocumentComplete event to fire, then QueryInterface() the Document property for the IPersistStreamInit interface and call its save() method. Create a TStream object instance, such as a TMemoryStream, wrap it in a TStreamAdapter object, and then pass the adapter to save(). You can then load the TStream into the TMemo as needed.

Treble answered 10/4, 2012 at 15:40 Comment(0)
R
2

Why not Quick and Dirty:

OnNavigateComplete2()

Form1.RichEdit1.Text:=(WebBrowser1.OleObject.Document.documentElement.outerhtml);
Reddin answered 6/10, 2020 at 14:48 Comment(1)
This simple version works much better on UTF-8 encoded pages with non-ASCII text.Heliogabalus

© 2022 - 2024 — McMap. All rights reserved.