Convert HTML to CSV in php?
Asked Answered
P

3

6

I have a html table structure like this;

            <tr style="font-weight: bold">
                <td>ID</td>
                <td>Navn</td>

                <td>Adresse</td>
                <td>By</td>
                <td>Post nr</td>
                <td>E-mail</td>
                <td>Telefon</td>
                <td>Status og dato</td>
                <td>Dropdown info</td>
                <td>Produkt info</td>
                <td>Buydate</td>
                <td>Ref nr. (3 første cifre)</td>
            </tr>
                    <tr>
                <td>40563</td>
                <td>Firstname Lastname</td>

                <td>Address</td>
                <td>Copen</td>
                <td>2100</td>
                <td>[email protected]</td>
                <td>123123</td>
                <td>Ikke indløst</td>
                <td>EEE-BBB</td>
</tr>

I would like to convert this into a csv/excel file by php.

So each is a row in excel, and each is a cell in the row,

Please how can this be done?

I have researched and found Converting HTML Table to a CSV automatically using PHP? but the answer does not work properly for me, Im getting all the cell results in one 'cell', so each row only have one cell.

This is what i have tried;

        $html = str_get_html($table);



        header('Content-type: application/ms-excel');
        header('Content-Disposition: attachment; filename=sample.csv');

        $fp = fopen("php://output", "w");

        foreach($html->find('tr') as $element)
        {
            $td = array();
            foreach( $element->find('td') as $row)  
            {
                $td [] = $row->plaintext;
            }
            fputcsv($fp, $td);
        }


        fclose($fp);
        exit;

Where $table is the html above. Using simple html dom plugin

Parted answered 31/5, 2012 at 15:30 Comment(0)
U
4

it seems that produced CVS has problems with some MS excel version. according to this page:

However, certain Microsoft programs (I'm looking at you, Access 97), 
will fail to recognize the CSV properly unless each line ends with \r\n.

so i modified the code as:

$td = array();
foreach( $element->find('td') as $row) {
   $td[] = $row->plaintext;
}
fwrite($fp,implode(";",$td)."\r\n");

but says also this:

Secondly, if the first column heading / value of the CSV file begins with 
`uppercase `ID, certain Microsoft programs (ahem, Excel 2007) will interpret 
the file `as` being in the` SYLK format rather than CSV`

So i changed the ID,... to id,... All in all, with lower case 'id' and ';' as delimiter this loaded as expected in MS excel 2003.

UPDATED:

i found a way to properly load a UTF8 .csv into excel by adding the BOM signature in the file. In PHP this can be done:

fwrite($fp,"\xEF\xBB\xBF");
...start writing

these 3 characters (1 unicode actually) forces excel and the likes to understand the .csv file AS utf8 and therefore decoding it internally.

There is another solution without using the BOM but its a kind of hack and not well tested; just create your file as file.txt (notice the .txt, not .csv), forcing excel to ask you about the encoding you want; you choose utf8 and done.

Uropod answered 31/5, 2012 at 16:11 Comment(3)
Excel 2003 also identified it as an SYLK file.Simonsen
a SYLK file is one that starts with ID or ID_xxx as one can read here. So, changing from 'ID' to 'id' should fix it. Perhaps some excel versions doesnt like that ID, upper/lower case..Uropod
Fantastic, this is working too! But still it makes my ÆØÅ chars weird, what can i do??Parted
E
4

You can load them in an array using the PHP DOM classes

$data = array();
$doc = new DOMDocument();
$doc->loadHTML($html);
$rows = $doc->getElementsByTagName('tr');
foreach($rows as $row) {
    $values = array();
    foreach($row->childNodes as $cell) {
        $values[] = $cell->textContent;
    }
    $data[] = $values;
}

You can then convert that array to CSV data like in your example, or just simply build the CSV string directly in the loops.

Live example

Escapee answered 31/5, 2012 at 15:59 Comment(0)
U
4

it seems that produced CVS has problems with some MS excel version. according to this page:

However, certain Microsoft programs (I'm looking at you, Access 97), 
will fail to recognize the CSV properly unless each line ends with \r\n.

so i modified the code as:

$td = array();
foreach( $element->find('td') as $row) {
   $td[] = $row->plaintext;
}
fwrite($fp,implode(";",$td)."\r\n");

but says also this:

Secondly, if the first column heading / value of the CSV file begins with 
`uppercase `ID, certain Microsoft programs (ahem, Excel 2007) will interpret 
the file `as` being in the` SYLK format rather than CSV`

So i changed the ID,... to id,... All in all, with lower case 'id' and ';' as delimiter this loaded as expected in MS excel 2003.

UPDATED:

i found a way to properly load a UTF8 .csv into excel by adding the BOM signature in the file. In PHP this can be done:

fwrite($fp,"\xEF\xBB\xBF");
...start writing

these 3 characters (1 unicode actually) forces excel and the likes to understand the .csv file AS utf8 and therefore decoding it internally.

There is another solution without using the BOM but its a kind of hack and not well tested; just create your file as file.txt (notice the .txt, not .csv), forcing excel to ask you about the encoding you want; you choose utf8 and done.

Uropod answered 31/5, 2012 at 16:11 Comment(3)
Excel 2003 also identified it as an SYLK file.Simonsen
a SYLK file is one that starts with ID or ID_xxx as one can read here. So, changing from 'ID' to 'id' should fix it. Perhaps some excel versions doesnt like that ID, upper/lower case..Uropod
Fantastic, this is working too! But still it makes my ÆØÅ chars weird, what can i do??Parted
S
2

I hate to say that it worked for me, but...it worked for me. This is the script I used.

<?php
    include('simple_html_dom.php');

    $table = '<tr style="font-weight: bold">
                <td>ID</td>
                <td>Navn</td>
                <td>Adresse</td>
                <td>By</td>
                <td>Post nr</td>
                <td>E-mail</td>
                <td>Telefon</td>
                <td>Status og dato</td>
                <td>Dropdown info</td>
                <td>Produkt info</td>
                <td>Buydate</td>
                <td>Ref nr. (3 første cifre)</td>
            </tr>
                    <tr>
                <td>40563</td>
                <td>Firstname Lastname</td>

                <td>Address</td>
                <td>Copen</td>
                <td>2100</td>
                <td>[email protected]</td>
                <td>123123</td>
                <td>Ikke indløst</td>
                <td>EEE-BBB</td>
</tr>
';
        $html = str_get_html($table);

        header('Content-type: application/ms-excel');
        header('Content-Disposition: attachment; filename=sample.csv');

        $fp = fopen("php://output", "w");

        foreach($html->find('tr') as $element)
        {
            $td = array();
            foreach( $element->find('td') as $row)  
            {
                $td [] = $row->plaintext;
            }
            fputcsv($fp, $td);
        }

        fclose($fp);
?>

I did get a note about the file being a SYLK file and not being able to load it in Excel. Clicking OK to this message opened the file normally. If this is your error, it is caused by this line: <td>ID</td> The SYLK file type is identified by a capital ID in the first cell of a text (CSV) file. You can prevent this message by changing it to lowercase or changing the label all together.

This is the output I get once I have completely opened the file: Excel output

Simonsen answered 31/5, 2012 at 15:58 Comment(2)
It works for me too now thanks!! But why does my ÆØÅ not appear right like yours in the screenshot?Parted
@Karem, possibly different language defaults or even language packs that are installed in Office.Simonsen

© 2022 - 2024 — McMap. All rights reserved.