Simple HTML DOM returning false [duplicate]
Asked Answered
R

1

1

I've encountered something strange when using Simple HTML DOM to parse a webpage with a certain query string. Some query strings work when trying to parse this used car page of a dealership's website, however others do not. It seems to be that whenever there are more vehicles to be shown on the page, it will not return the HTML content (meaning if we are on the last page of pagination it will work, otherwise it won't). Just wondering if anyone has any ideas. I've tried viewing the page with javascript disabled to see if the markup is different, but it seems like the page behaves similarly. Below is code if anyone has any ideas... Or better yet solutions. Thanks all!

require ('simple_html_dom.php');
error_reporting(E_ALL);
$startingURL = 'http://www.buickgmcofmilford.com/VehicleSearchResults?model=&certified=&location=&miles=&maxPrice=&minYear=&maxYear=&bodyType=&search=preowned&trim=&make=&pageNumber=2';
$getHTML = file_get_html($startingURL);
if ($getHTML == true){
    echo '<h1>TRUE</h1>';
    var_dump($getHTML);
}
else {
    echo '<h1>FALSE</h1>';
    var_dump($getHTML);
}

When using var_dump with the above URL it returns a boolean false. When using the following URL, I can parse the data no issue - http://www.buickgmcofmilford.com/VehicleSearchResults?model=&certified=&location=&miles=&maxPrice=&minYear=&maxYear=&bodyType=&search=preowned&trim=&make=&pageNumber=5

Thanks.

Rhaetic answered 9/2, 2016 at 15:50 Comment(4)
dom is VERY picky about malformed html. your browser being able to display it means nothing.b rowsers are extremely lenient about bad html. dom isn't.Mayle
@MarcB any reason why you think the second URL seems to return no problem, but the first URL can't return anything. Seems like the markup is the same in both instances...?Rhaetic
You could use DOMDocument::loadHTML - it emits lots of warnings, but seems to work.Greer
Does this answer your question? PHP Simple HTML DOM Parser returning false on valid urlEnthuse
A
1

you should not use the default function file_get_html for getting remote content, that function use file_get_content to download page content. Sometime the target website will block your request by the user agent or referer. You could try PHP Curl to download page content first, then parse it with simple_html_dom

Afforest answered 9/2, 2016 at 16:17 Comment(1)
Even using CURL to retrieve remote content, I can still only parse certain URL's of this website. I can get the information I need with this URL link, but can't with this URL linkRhaetic

© 2022 - 2024 — McMap. All rights reserved.