Converting HTML Table to a CSV automatically using PHP?
Asked Answered
P

9

23

I am just in need to convert a this html table automatically in csv using PHP. Can someone provide any idea how to do this? Thanks.

$table = '<table border="1">
<tr>
<th>Header 1</th>
<th>Header 2</th>
</tr>
<tr>
<td>row 1, cell 1</td>
<td>row 1, cell 2</td>
</tr>
<tr>
<td>row 2, cell 1</td>
<td>row 2, cell 2</td>
</tr>
</table>';

Guys, I just need $table to convert in only .csv file, which could be automatically generated using some PHP function. We can define path for that csv file to /test/home/path_to_csv

Planoconvex answered 8/5, 2012 at 12:28 Comment(7)
I searched the Google, but didn't find relevantPlanoconvex
I realize this isn't quite what you are looking for but you can save it as an xls. and excel will open it as expected.Demagogue
What exactly are you trying to achieve? Are you looking to parse the HTML and extract data or is it one-off analysis? For one-off, I would recommend information found here. For more universal and automatic work, HTML needs to be parsed - try simple parse. Or, if the HTML is of a fixed format, you may have luck with regular expression to scrape the tags out and leave only the data. (e.g. replace the td with some field separators) Then, CSV output is a trivial matter.Resolvent
I am not trying to parse anything. I do have my own variable $table, which has the html table data in it. Now I want to convert that data to .csv file, using PHPPlanoconvex
in order to get the structure from HTML string, you do need to parse the data (stored in the variable in this case). Then, you can extract the information you are interested in and store it as a CSVResolvent
Have you tried this? davidvielmetter.com/tricks/…Gazetteer
I am getting an error that my csv is blank ..... i have a file which there is title tag and 2 3 more tags are there with the table now how i will read it and parse it into it. $file= file_get_contents($file_txt."/D_LOTMAN.HTM"); print_r($file); include './simple_html_dom.php'; $html = str_get_html($file); $fpc = fopen("downloads/me.csv", "w"); foreach($html->find('tr') as $element) { $td = array(); foreach( $element->find('td') as $row) { $td [] = $row->plaintext; } fputcsv($fpc, $td); } fclose($fpc);Terisateriyaki
C
26

You can use str_get_html http://simplehtmldom.sourceforge.net/

include "simple_html_dom.php";
$table = '<table border="1">
<tr>
<th>Header 1</th>
<th>Header 2</th>
</tr>
<tr>
<td>row 1, cell 1</td>
<td>row 1, cell 2</td>
</tr>
<tr>
<td>row 2, cell 1</td>
<td>row 2, cell 2</td>
</tr>
</table>';

$html = str_get_html($table);



header('Content-type: application/ms-excel');
header('Content-Disposition: attachment; filename=sample.csv');

$fp = fopen("php://output", "w");

foreach($html->find('tr') as $element)
{
        $th = array();
        foreach( $element->find('th') as $row)  
        {
            $th [] = $row->plaintext;
        }

        $td = array();
        foreach( $element->find('td') as $row)  
        {
            $td [] = $row->plaintext;
        }
        !empty($th) ? fputcsv($fp, $th) : fputcsv($fp, $td);
}


fclose($fp);
Christenachristendom answered 8/5, 2012 at 12:48 Comment(8)
Baba, how can we define some path for that csv file to get automatically saved to e.g. /test/home/path_to_csvPlanoconvex
Yes you can ..... all you need to do is remove the headers .... and change "php://output" to your pathChristenachristendom
Thanks for your method. I will try it on my PC (at present I'm on mobile), and let you know.Planoconvex
Oops! Baba, I changed the php://output to /mohan/path_to_csv, and when I am loading the page it's giving several warnings! like Warning: fopen(/mohan/path_to_csv) [function.fopen]: failed to open stream: No such file or directory in C:\wamp\www\mohan\TableToCsv.php on line 29. Line 29 is $fp = fopen("/mohan/path_to_csv", "w");Planoconvex
Hi, i am getting the CSV output as in this photo i.imgur.com/x2mWb.png, which inserts extra blank lines in between the data. Is Can it be fixed? I tried trim(). But it didn't work.Planoconvex
fputcsv has additional parameters that you can experiment with and make sure you filter out empty arrayChristenachristendom
In the lib code has MAX_FILE_SIZE which set to 600000 and if function file_get_html and str_get_html get the string size over than 600000 won't allow to progress and return false instead. I wish to work with a large table which contains string length up to 1 million and over, how should I do, can you suggest me that how important of the MAX_FILE_SIZE number.Salina
I tried this method, The csv is getting created but the table column content is not present in the csv. Instead, the content is written twice. Any idea why? Im using 1.8.1 versionGood
H
17

You can use this function in separate js file:

function exportTableToCSV($table, filename) {

        var $rows = $table.find('tr:has(td)'),

            // Temporary delimiter characters unlikely to be typed by keyboard
            // This is to avoid accidentally splitting the actual contents
            tmpColDelim = String.fromCharCode(11), // vertical tab character
            tmpRowDelim = String.fromCharCode(0), // null character

            // actual delimiter characters for CSV format
            colDelim = '","',
            rowDelim = '"\r\n"',

            // Grab text from table into CSV formatted string
            csv = '"' + $rows.map(function (i, row) {
                var $row = $(row),
                    $cols = $row.find('td');

                return $cols.map(function (j, col) {
                    var $col = $(col),
                        text = $col.text();

                    return text.replace('"', '""'); // escape double quotes

                }).get().join(tmpColDelim);

            }).get().join(tmpRowDelim)
                .split(tmpRowDelim).join(rowDelim)
                .split(tmpColDelim).join(colDelim) + '"',

            // Data URI
            csvData = 'data:application/csv;charset=utf-8,' + encodeURIComponent(csv);

        $(this)
            .attr({
            'download': filename,
                'href': csvData,
                'target': '_blank'
        });
    }

Now, to initiate this function, you can use:

$('.getfile').click(
            function() { 
    exportTableToCSV.apply(this, [$('#thetable'), 'filename.csv']);
             });

where 'getfile' should be the class assigned to button, where you want to add call to action. (On clicking this button, the download popup will appear) and "thetable" should be the ID assigned to table you want to download.

You can also change to the custom file name to download in code.

Hale answered 20/11, 2013 at 9:55 Comment(2)
This work brilliantly. I shall sacrifice 100 bulls in your honour.Casillas
Please update code to include the HTML <a> element as the clickable "button". I was using <button> element and it took me a few minutes to realize it must be <a>, or at least mention that it should be <a> element. Thank youOrchestrion
A
7

You can do this with arrays and regular expressions... See below

$csv = array();
preg_match('/<table(>| [^>]*>)(.*?)<\/table( |>)/is',$table,$b);
$table = $b[2];
preg_match_all('/<tr(>| [^>]*>)(.*?)<\/tr( |>)/is',$table,$b);
$rows = $b[2];
foreach ($rows as $row) {
    //cycle through each row
    if(preg_match('/<th(>| [^>]*>)(.*?)<\/th( |>)/is',$row)) {
        //match for table headers
        preg_match_all('/<th(>| [^>]*>)(.*?)<\/th( |>)/is',$row,$b);
        $csv[] = strip_tags(implode(',',$b[2]));
    } elseif(preg_match('/<td(>| [^>]*>)(.*?)<\/td( |>)/is',$row)) {
        //match for table cells
        preg_match_all('/<td(>| [^>]*>)(.*?)<\/td( |>)/is',$row,$b);
        $csv[] = strip_tags(implode(',',$b[2]));
    }
}
$csv = implode("\n", $csv);
var_dump($csv);

Then you can use file_put_contents() to write the csv string to file..

Aegrotat answered 19/4, 2013 at 17:40 Comment(3)
Classical answer. How to handle HTML tags to be put in CSV?Paddlefish
I was only working with the scope provided at the time which was 5 or more years ago. If you escape your strings properly they should maintain their column structures.Aegrotat
Great! And efficient answer in present context too. :-)Paddlefish
R
5

To expand on the accepted answer I did this which allows me to ignore columns by class name and also deals with blank rows/columns.

You can use str_get_html http://simplehtmldom.sourceforge.net/. Just include it and away you go! :)

$html = str_get_html($html); // give this your HTML string

header('Content-type: application/ms-excel');
header('Content-Disposition: attachment; filename=sample.csv');

$fp = fopen("php://output", "w");

foreach($html->find('tr') as $element) {
  $td = array();
  foreach( $element->find('th') as $row) {
    if (strpos(trim($row->class), 'actions') === false && strpos(trim($row->class), 'checker') === false) {
      $td [] = $row->plaintext;
    }
  }
  if (!empty($td)) {
    fputcsv($fp, $td);
  }

  $td = array();
  foreach( $element->find('td') as $row) {
    if (strpos(trim($row->class), 'actions') === false && strpos(trim($row->class), 'checker') === false) {
      $td [] = $row->plaintext;
    }
  }
  if (!empty($td)) {
    fputcsv($fp, $td);
  }
}

fclose($fp);
exit;
Ralline answered 4/4, 2013 at 12:36 Comment(0)
T
1

If anyone is using Baba's answer but scratching their head over extra white spaces being added, this will work:

include "simple_html_dom.php";
$table = '<table border="1">
<tr>
<th>Header 1</th>
<th>Header 2</th>
</tr>
<tr>
<td>row 1, cell 1</td>
<td>row 1, cell 2</td>
</tr>
<tr>
<td>row 2, cell 1</td>
<td>row 2, cell 2</td>
</tr>
</table>';

$html = str_get_html($table);   

$fileName="export.csv";
header('Content-type: application/ms-excel');
header("Content-Disposition: attachment; filename=$fileName");

$fp = fopen("php://output", "w");
$csvString="";

$html = str_get_html(trim($table));
foreach($html->find('tr') as $element)
{

    $td = array();
    foreach( $element->find('th') as $row)
    {
        $row->plaintext="\"$row->plaintext\"";
        $td [] = $row->plaintext;
    }
    $td=array_filter($td);
    $csvString.=implode(",", $td);

    $td = array();
    foreach( $element->find('td') as $row)
    {
        $row->plaintext="\"$row->plaintext\"";
        $td [] = $row->plaintext;
    }
    $td=array_filter($td);
    $csvString.=implode(",", $td)."\n";
}
echo $csvString;
fclose($fp);
exit;

}

Talanian answered 17/7, 2013 at 17:31 Comment(0)
M
1

I've adapted a simple class based on the code found on this thread that now handles colspan and rowspan. Not heavily tested and I'm sure it could be optimized.

Usage:

require_once('table2csv.php');

$table = '<table border="1">
    <tr>
    <th colspan=2>Header 1</th>
    </tr>
    <tr>
    <td>row 1, cell 1</td>
    <td>row 1, cell 2</td>
    </tr>
    <tr>
    <td>row 2, cell 1</td>
    <td>row 2, cell 2</td>
    </tr>
    <tr>
    <td rowspan=2>top left row</td>
    <td>top right row</td>
    </tr>
    <tr>
    <td>bottom right</td>
    </tr>
    </table>';

table2csv($table,"sample.csv",true);

table2csv.php

<?php

    //download @ http://simplehtmldom.sourceforge.net/
    require_once('simple_html_dom.php');
    $repeatContentIntoSpannedCells = false;


    //--------------------------------------------------------------------------------------------------------------------

    function table2csv($rawHTML,$filename,$repeatContent) {

        //get rid of sups - they mess up the wmus
        for ($i=1; $i <= 20; $i++) { 
            $rawHTML = str_replace("<sup>".$i."</sup>", "", $rawHTML);
        }

        global $repeatContentIntoSpannedCells;

        $html = str_get_html(trim($rawHTML));
        $repeatContentIntoSpannedCells = $repeatContent;

        //we need to pre-initialize the array based on the size of the table (how many rows vs how many columns)

        //counting rows is easy
        $rowCount = count($html->find('tr'));

        //column counting is a bit trickier, we have to iterate through the rows and basically pull out the max found
        $colCount = 0;
        foreach ($html->find('tr') as $element) {

            $tempColCount = 0;

            foreach ($element->find('th') as $cell) {
                $tempColCount++;
            }

            if ($tempColCount == 0) {
                foreach ($element->find('td') as $cell) {
                    $tempColCount++;
                }
            }

            if ($tempColCount > $colCount) $colCount = $tempColCount;
        }

        $mdTable = array();

        for ($i=0; $i < $rowCount; $i++) { 
            array_push($mdTable, array_fill(0, $colCount, NULL));
        }

        //////////done predefining array

        $rowPos = 0;
        $fp = fopen($filename, "w");

        foreach ($html->find('tr') as $element) {

            $colPos = 0;

            foreach ($element->find('th') as $cell) {
                if (strpos(trim($cell->class), 'actions') === false && strpos(trim($cell->class), 'checker') === false) {
                    parseCell($cell,$mdTable,$rowPos,$colPos);
                }
                $colPos++;
            }

            foreach ($element->find('td') as $cell) {
                if (strpos(trim($cell->class), 'actions') === false && strpos(trim($cell->class), 'checker') === false) {
                    parseCell($cell,$mdTable,$rowPos,$colPos);
                }
                $colPos++;
            }   

            $rowPos++;
        }


        foreach ($mdTable as $key => $row) {

            //clean the data
            array_walk($row, "cleanCell");
            fputcsv($fp, $row);
        }
    }


    function cleanCell(&$contents,$key) {

        $contents = trim($contents);

        //get rid of pesky &nbsp's (aka: non-breaking spaces)
        $contents = trim($contents,chr(0xC2).chr(0xA0));
        $contents = str_replace("&nbsp;", "", $contents);
    }


    function parseCell(&$cell,&$mdTable,&$rowPos,&$colPos) {

        global $repeatContentIntoSpannedCells;

        //if data has already been set into the cell, skip it
        while (isset($mdTable[$rowPos][$colPos])) {
            $colPos++;
        }

        $mdTable[$rowPos][$colPos] = $cell->plaintext;

        if (isset($cell->rowspan)) {

            for ($i=1; $i <= ($cell->rowspan)-1; $i++) {
                $mdTable[$rowPos+$i][$colPos] = ($repeatContentIntoSpannedCells ? $cell->plaintext : "");
            }
        }

        if (isset($cell->colspan)) {

            for ($i=1; $i <= ($cell->colspan)-1; $i++) {

                $colPos++;
                $mdTable[$rowPos][$colPos] = ($repeatContentIntoSpannedCells ? $cell->plaintext : "");
            }
        }
    }

?>
Monochloride answered 23/7, 2013 at 17:48 Comment(0)
B
1

Baba's answer contains extra space. So, I updated the code to this:

include "simple_html_dom.php";
$table = '<table border="1">
<tr>
<th>Header 1</th>
<th>Header 2</th>
</tr>
<tr>
<td>row 1, cell 1</td>
<td>row 1, cell 2</td>
</tr>
<tr>
<td>row 2, cell 1</td>
<td>row 2, cell 2</td>
</tr>
</table>';

$html = str_get_html($table);



header('Content-type: application/ms-excel');
header('Content-Disposition: attachment; filename=sample.csv');

$fp = fopen("php://output", "w");

foreach($html->find('tr') as $element)
{
    $td = array();
foreach( $element->find('th') as $row)
{
    $td [] = $row->plaintext;
}

foreach( $element->find('td') as $row)
{
    $td [] = $row->plaintext;
}
fputcsv($fp, $td);
}


fclose($fp);
Beaston answered 8/9, 2016 at 11:49 Comment(0)
C
0

Assuming that out_str has your html table data

$csv = $out_str;
        $csv = str_replace("<table class='gradienttable'>","",$csv);
        $csv = str_replace("</table>","",$csv);
        $csv = str_replace("</td><td>",",",$csv);
        $csv = str_replace("<td>","",$csv);
        $csv = str_replace("</td>","",$csv);
        $csv = str_replace("</font>","",$csv);
        $csv = str_replace("</tr>","",$csv);
        $csv = str_replace("<tr>","\n",$csv);

Remove any CSS from the file

        $csv = str_replace("<font color='yellow'>","",$csv);
        $csv = str_replace("<font color='red'>","",$csv);
        $csv = str_replace("<font color='green'>","",$csv);
        $csv = str_replace("</th><th>",",",$csv);
        $csv = str_replace("<th>","",$csv);
        $csv = str_replace("</th>","",$csv);

        file_put_contents('currentFile.csv',$csv);

Output the file currentFile.csv to the user

Hope it helps!

Crimmer answered 12/3, 2020 at 11:20 Comment(0)
C
0

I had to elaborate on Jacob Cruz answer:

$csv = array();
preg_match('/<table(>| [^>]*>)(.*?)<\/table( |>)/is',$table,$b);
$table = $b[2];
preg_match_all('/<tr(>| [^>]*>)(.*?)<\/tr( |>)/is',$table,$b);
$rows = $b[2];
foreach ($rows as $row) 
{
    //cycle through each row
    if(preg_match('/<th(>| [^>]*>)(.*?)<\/th( |>)/is',$row))
    {
        //match for table headers
        preg_match_all('/<th(>| [^>]*>)(.*?)<\/th( |>)/is',$row,$b);
        $csv[] = array(strip_tags(implode(',',$b[2])));
    } elseif(preg_match('/<td(>| [^>]*>)(.*?)<\/td( |>)/is',$row)) 
    {
        //match for table cells
        preg_match_all('/<td(>| [^>]*>)(.*?)<\/td( |>)/is',$row,$b);
        $csv[] = array(strip_tags(implode(',',$b[2])));
    }
}
$file = "file.csv";
$fp = fopen($file, 'w'); // open in write only mode (write at the start of the file)
fprintf($fp, chr(0xEF).chr(0xBB).chr(0xBF)); // writes file header for correct encoding
foreach ($csv as $row) {
    fputcsv($fp, $row);
}
fclose($fp);
Christa answered 27/2 at 10:44 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.