How Can I Log Into a Web Forms Website Before Parsing with Simple HTML Dom Parser?
Asked Answered
W

1

6

Here is what I am trying to accomplish:

I need to scrape product data from this website, but the pricing is different when you are logged in. Thus, I need to submit this login form (via php), then use Simple HTML DOM Parser to scrape the product data.

I have found the following similar posts:

However, none of the answers have allowed me to login and proceed to scrape while logged into the following site: https://www.bestlinknetware.com/Account/LogOn

What I Have Tried

Attempt #1

$data = http_build_query(array(
          "UserName" => "ourValidUsername",
          "Password" => "ourValidPassword"
        ));

send_message("<p>" . $data . "</p>");

$request = array(
  "http" => array(
    "header" => "Content-Type: application/x-www-form-urlencoded\r\n".
                "Content-Length: " . strlen($data) . "\r\n".
                "User-Agent:MyAgent/1.0\r\n",
    "method" => "POST",
    "content" => $data
  )
);

$context = stream_context_create($request);

$html = file_get_contents( $crawl["url"] . "/Account/LogOn", false, $context, -1, 40000 );

echo $html;

Attempt #2

$url = "https://www.bestlinknetware.com/Account/LogOn"; 
$cookie="cookie.txt"; 

$data = array(
  "UserName" => "ourValidUsername",
  "Password" => "ourValidPassword"
);

$postData = http_build_query($data);

$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/4");
curl_setopt ($ch, CURLOPT_TIMEOUT, 60);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_COOKIEJAR, $cookie);
curl_setopt ($ch, CURLOPT_REFERER, $url);

curl_setopt ($ch, CURLOPT_POST, 2);
curl_setopt ($ch, CURLOPT_POSTFIELDS, $postData);
$result = curl_exec ($ch);

echo $result;  
curl_close($ch);

Both of the above attempts resulted in (seemingly) nothing happening. The page just shows the login form... I can't tell if the login attempt is failing, or if I need to do something like (after the cURL POST), add $html = file_get_html("http://www.bestlinknetware.com/"); and begin the parsing...

Note: When I simply add the $html = file_Get_html("...") script, I can scrape the site, but I get the regular (non-logged in) prices...

Can anyone with experience using Simple HTML DOM Parser shed some light on the proper way to submit POST data to login form like this, then load the post-login HTML into the Simple HTML DOM Parser object (so that I can scrape it)?

Wineglass answered 24/12, 2015 at 0:41 Comment(0)
C
0

Try this one.

include('simple_html_dom.php');
$context = stream_context_create(array('http' => array('header' => 'User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.60 Safari/537.17')));
$html = str_get_html( file_get_contents('http://page.com/user1', false, $context) );
Compact answered 26/5, 2020 at 22:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.