Here is what I am trying to accomplish:
I need to scrape product data from this website, but the pricing is different when you are logged in. Thus, I need to submit this login form (via php), then use Simple HTML DOM Parser to scrape the product data.
I have found the following similar posts:
- Simple HTML DOM Parser - Send post variables
- Authorize with curl and parse using simple html dom not working
- Login to ASP website before using PHP Simple HTML DOM Parser
- Processing HTTP Post with Array (no cURL)
- Using PHP & Curl to login to my websites form
- php curl script to get an aspx page's content
- https://davidwalsh.name/curl-post
However, none of the answers have allowed me to login and proceed to scrape while logged into the following site: https://www.bestlinknetware.com/Account/LogOn
What I Have Tried
Attempt #1
$data = http_build_query(array(
"UserName" => "ourValidUsername",
"Password" => "ourValidPassword"
));
send_message("<p>" . $data . "</p>");
$request = array(
"http" => array(
"header" => "Content-Type: application/x-www-form-urlencoded\r\n".
"Content-Length: " . strlen($data) . "\r\n".
"User-Agent:MyAgent/1.0\r\n",
"method" => "POST",
"content" => $data
)
);
$context = stream_context_create($request);
$html = file_get_contents( $crawl["url"] . "/Account/LogOn", false, $context, -1, 40000 );
echo $html;
Attempt #2
$url = "https://www.bestlinknetware.com/Account/LogOn";
$cookie="cookie.txt";
$data = array(
"UserName" => "ourValidUsername",
"Password" => "ourValidPassword"
);
$postData = http_build_query($data);
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/4");
curl_setopt ($ch, CURLOPT_TIMEOUT, 60);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_COOKIEJAR, $cookie);
curl_setopt ($ch, CURLOPT_REFERER, $url);
curl_setopt ($ch, CURLOPT_POST, 2);
curl_setopt ($ch, CURLOPT_POSTFIELDS, $postData);
$result = curl_exec ($ch);
echo $result;
curl_close($ch);
Both of the above attempts resulted in (seemingly) nothing happening. The page just shows the login form... I can't tell if the login attempt is failing, or if I need to do something like (after the cURL POST), add $html = file_get_html("http://www.bestlinknetware.com/");
and begin the parsing...
Note: When I simply add the $html = file_Get_html("...")
script, I can scrape the site, but I get the regular (non-logged in) prices...
Can anyone with experience using Simple HTML DOM Parser shed some light on the proper way to submit POST data to login form like this, then load the post-login HTML into the Simple HTML DOM Parser object (so that I can scrape it)?