I'm trying to parse an HTML file that has terrible (believe me, it is) HTML structure and because of this and my lack of knowledge, I couldn't write my own parser. Later I tried using Simple HTML Dom parser, because a lot of people (on SO as well) recommend it.
I required the simple_html_dom.php, then created the object. They seem to work, the require() function returns "1" and var_dump()-ing the object returns an object.
After this I tried to load the URL as it was done in the manual, but I got a fatal error, no matter what URL I tried. The error was the following:
Fatal error: Call to undefined function mb_detect_encoding() in
/home/fema/web/subdomain/devel/www_root/parser/
simplehtmldom_1_5/simple_html_dom.php on line 988
I checked what's on line 988 and it is the following:
// Have php try to detect the encoding from the text given to us.
$charset = mb_detect_encoding($this->root->plaintext . "ascii",
$encoding_list = array( "UTF-8", "CP1252" ) );
I understand that this is about character encoding, but that's all. I haven't found anything about this neither with google or on SO.
My whole code is (placeholder URL):
<?php
require('simplehtmldom_1_5/simple_html_dom.php');
// Create a DOM object
$dom = new simple_html_dom();
$dom->load_file('http://www.google.com/');
?>
Could anyone please tell me what to do? Or some kind of advice when something like this happens.
Thanks in advance.