Trouble getting source code from a webpage

About

Asked 18/9, 2018 at 18:55 Answered 18/9, 2018 at 18:55

I've written a script in php to get the html content or source code from a webpage but I could not succeed. When I execute my script, it opens the page itself. How can I get the html element or source code?

This is the script:

<?php
include "simple_html_dom.php";
function get_source($url)
{
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    $htmlContent = curl_exec($ch);
    curl_close($ch);
    $dom = new simple_html_dom();
    $dom->load($htmlContent);
    return $dom;
}
$scraped_page = get_source("https://stackoverflow.com/questions/tagged/web-scraping");
echo $scraped_page;
?>

Currently I'm getting like this:

My expected output is something like:

Btw, echoing $htmlContent also gives me what you can see in image 1.

Fixate answered 18/9, 2018 at 18:55 Comment(15)

The last line of code echo $scraped_page; displays the document you've loaded, so you should be able to use this to extract the data instead. – Enterostomy 18/9, 2018 at 18:59

Yes, I know but how can I get the source code then? Thanks for your comment @Nigel Ren. – Fixate 18/9, 2018 at 19:0

That is the source code, not sure what you are expecting to get? If you want to display the source - either put echo '<pre>'; before and echo '</pre>'; after the echo. Or view the source in your browser. – Enterostomy 18/9, 2018 at 19:2

Read the docs on the library you're using. The reason that you're getting what you're getting is because the object you're echoing has a __toString() function that just returns the bare source. If you want to do something else you need to do something else. – Essieessinger 18/9, 2018 at 19:2

I never asked why don't I get source code using my above script; rather, I asked how I can get them, meaning which way. The above script is just a placeholder to let you know that I tried myself before making a post. Thanks. – Fixate 18/9, 2018 at 19:7

Please give and example of the desired output. – Ahmed 18/9, 2018 at 19:9

Possible duplicate of PHP Parse HTML code – Enterostomy 18/9, 2018 at 19:10

What we see when we inspect element or click on View page source button. – Fixate 18/9, 2018 at 19:16

This is the most basic thing what other languages provide in the first place. However, this is a wrongly applied Possible duplicate flag when the question there is totally different from what I've asked here. Thanks anyway. – Fixate 18/9, 2018 at 19:23

is echo $scraped_page not showing what you expected? What is it showing? What did you expect? if the curl request succeeded, it should be showing you some HTML. If it isn't, you probably need to find out why the request failed, or what else went wrong with your script. "Didn't succeed" as a description of your problem doesn't really give us much to go on. What do you mean by "opens the page itself"? Which page? Opens how, exactly? You're just echoing the result of the curl request, that's all. We would really like to help, but we need you to be more specific about your problem. Thankyou. – Khalilahkhalin 18/9, 2018 at 19:26

It strikes me that if you want the raw HTML returned by the curl request, I would suggest echoing $htmlContent instead rather than echoing $dom, which it seems is likely to be an object. – Khalilahkhalin 18/9, 2018 at 19:30

Please check out the edit @ADyson. – Fixate 18/9, 2018 at 19:39

Ok thanks. I guess because you are echoing it into an existing HTML document, so the browser treats it like any other HTML which forms part of the page - i.e. it parses it and renders it. I didn't know if you were executing this from the command-line, or maybe echoing it into a textbox, or anything else. Now we have some context. If you want to see the raw HTML in this context, you need to HTML-encode it so the browser sees it as text and not HTML to actually be interpreted and rendered. – Khalilahkhalin 18/9, 2018 at 19:43

There are potentially a couple of different ways to do that. See google.co.uk/… – Khalilahkhalin 18/9, 2018 at 19:43

Can you please be more clear about the expected output? Like providing an example of desired output in text form, not an image. – Ahmed 18/9, 2018 at 19:49

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags