Simple HTML Dom - Fatal error when using load_file
Asked Answered
B

3

8

I'm trying to parse an HTML file that has terrible (believe me, it is) HTML structure and because of this and my lack of knowledge, I couldn't write my own parser. Later I tried using Simple HTML Dom parser, because a lot of people (on SO as well) recommend it.

I required the simple_html_dom.php, then created the object. They seem to work, the require() function returns "1" and var_dump()-ing the object returns an object.

After this I tried to load the URL as it was done in the manual, but I got a fatal error, no matter what URL I tried. The error was the following:

Fatal error: Call to undefined function mb_detect_encoding() in 
             /home/fema/web/subdomain/devel/www_root/parser/
             simplehtmldom_1_5/simple_html_dom.php on line 988

I checked what's on line 988 and it is the following:

// Have php try to detect the encoding from the text given to us.
        $charset = mb_detect_encoding($this->root->plaintext . "ascii", 
                   $encoding_list = array( "UTF-8", "CP1252" ) );

I understand that this is about character encoding, but that's all. I haven't found anything about this neither with google or on SO.

My whole code is (placeholder URL):

<?php

require('simplehtmldom_1_5/simple_html_dom.php');

// Create a DOM object
$dom = new simple_html_dom();

$dom->load_file('http://www.google.com/');

?>

Could anyone please tell me what to do? Or some kind of advice when something like this happens.

Thanks in advance.

Bubbler answered 14/7, 2012 at 12:16 Comment(0)
E
9

Your build of PHP is missing the multibyte string extension. It's actually quite unusual for this to be the case, unless you're using a really old build of PHP or one compiled with unusual compile options, as whilst the multibyte extension isn't enabled by default, it is usually considered to be one of the essential extensions that more or less every PHP build has these days.

If you're running an old version of PHP I'd strongly recommend upgrading, if you have a fairly recent build, check with phpinfo () that you have multibyte installed. If you don't, then you might need to reinstall or rebuild PHP from source.

If it's installed, --enable-mbstring should be in the list of compile options. See the PHP manual on the multibyte extension, especially the chapter on installation, for more details.

Enisle answered 14/7, 2012 at 12:27 Comment(2)
Thank you for your answer, it is PHP 5.3, but I'll ask my friend. (I'm using his server.)Bubbler
It seems you were right, but there is not enough RAM to compile a new PHP, he says. Thank you for your answer.Bubbler
M
6

I had the same issue using Amazon EC2 & a standard install of PHP. I did the following (found on http://php.net/manual/en/mbstring.installation.php) which solved the problem:

yum install php-mbstring
httpd -k restart
Moisture answered 7/12, 2012 at 17:8 Comment(1)
without yum you can use sudo apt-get install php7.0-mbstring from the same page as in the above answer.Policewoman
S
0

Remove the trailing forward slash (/) in your URL string provided in the load_file() method and it works.
Apparently, the load_file() method of the Simple HTML DOM Library has an issue with forward slashes being appended to the end of a URL string.

Selfregard answered 14/11, 2017 at 9:44 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.