Read large data from csv file in php [duplicate]
Asked Answered
S

3

12

I am reading csv & checking with mysql that records are present in my table or not in php.

csv has near about 25000 records & when i run my code it display "Service Unavailable" error after 2m 10s (onload: 2m 10s)

here i have added code

// for set memory limit & execution time
ini_set('memory_limit', '512M');
ini_set('max_execution_time', '180');

//function to read csv file
function readCSV($csvFile)
{
    $file_handle = fopen($csvFile, 'r');
    while (!feof($file_handle) ) {

       set_time_limit(60); // you can enable this if you have lot of data

       $line_of_text[] = fgetcsv($file_handle, 1024);
   }
   fclose($file_handle);
   return $line_of_text;
 }

// Set path to CSV file
$csvFile = 'my_records.csv';

$csv = readCSV($csvFile);

for($i=1;$i<count($csv);$i++)
{
   $user_email= $csv[$i][1];

   $qry = "SELECT u.user_id, u.user_email_id FROM tbl_user as u WHERE u.user_email_id = '".$user_email."'";

   $result = @mysql_query($qry) or die("Couldn't execute query:".mysql_error().''.mysql_errno());

   $rec = @mysql_fetch_row($result);

   if($rec)
   {
      echo "Record exist";
   }
   else
   {
      echo "Record not exist"; 
   }
}

Note: I just want to list out records those are not exist in my table.

Please suggest me solution on this...

Schmidt answered 8/7, 2013 at 6:8 Comment(7)
what about memory limit? Have you checked your logs?Croteau
@Croteau let me check logs...Schmidt
Here are time logs info: time - 5m 44s (onload: 5m 44s) total memory - 2.8 MBSchmidt
increase the execution time.Humism
Where does it say Service UnavailableInes
Service Unavailable The service is temporary unavailable. Please try later it shows above message when i run codeSchmidt
Also another suggestion is to use a prepared statment in the sql query, its faster and also validates the input from the csv. pdo->prepare outside the loop and pdo->bindvalue inside.Lexicography
B
13

An excellent method to deal with large files is located at: https://mcmap.net/q/433523/-file_get_contents-gt-php-fatal-error-allowed-memory-exhausted

This method is used at http://www.cuddlycactus.com/knownpasswords/ (page has been taken down) to search through 170+ million passwords in just a few milliseconds.

Beebread answered 8/7, 2013 at 8:11 Comment(1)
The second link is broken, so we cannot see how the method is being used. Therefore, this makes the second paragraph pointless and the answer becomes of very low quality as it's a link only answer.Bedstraw
E
6

After struggling a lot, finally i found a good solution, may be it help others also. When i tried 2,367KB csv file containing 18226 rows, the least time taken by different php scripts were (1) from php.net fgetcsv documentation named CsvImporter, and (2) file_get_contents => PHP Fatal error: Allowed memory exhausted

(1) took 0.92574405670166 (2) took 0.12543702125549 (string form) & 0.52903485298157 (splitted to array) Note: this calculation not include adding to mysql.

The best solution i found uses 3.0644409656525 total including adding to database and some conditional check also. It took 11 seconds in processing a 8MB file. solution is :

$csvInfo = analyse_file($file, 5);
    $lineSeperator = $csvInfo['line_ending']['value'];
    $fieldSeperator = $csvInfo['delimiter']['value'];
    $columns = getColumns($file);
    echo '<br>========Details========<br>';
    echo 'Line Sep: \t '.$lineSeperator;
    echo '<br>Field Sep:\t '.$fieldSeperator;
    echo '<br>Columns: ';print_r($columns);
    echo '<br>========Details========<br>';
    $ext = pathinfo($file, PATHINFO_EXTENSION);
    $table = str_replace(' ', '_', basename($file, "." . $ext));
    $rslt = table_insert($table, $columns);
    if($rslt){
        $query = "LOAD DATA LOCAL INFILE '".$file."' INTO TABLE $table FIELDS TERMINATED BY '$fieldSeperator' ";

        var_dump(addToDb($query, false));
    }


function addToDb($query, $getRec = true){
//echo '<br>Query : '.$query;
$con = @mysql_connect('localhost', 'root', '');
@mysql_select_db('rtest', $con);
$result = mysql_query($query, $con);
if($result){
    if($getRec){
         $data = array();
        while ($row = mysql_fetch_assoc($result)) { 
            $data[] = $row;
        }
        return $data;
    }else return true;
}else{
    var_dump(mysql_error());
    return false;
}
}


function table_insert($table_name, $table_columns) {
    $queryString = "CREATE TABLE " . $table_name . " (";
    $columns = '';
    $values = '';

    foreach ($table_columns as $column) {
        $values .= (strtolower(str_replace(' ', '_', $column))) . " VARCHAR(2048), ";
    }
    $values = substr($values, 0, strlen($values) - 2);

    $queryString .= $values . ") ";

    //// echo $queryString;

    return addToDb($queryString, false);
}


function getColumns($file){
    $cols = array();
    if (($handle = fopen($file, 'r')) !== FALSE)
    {
        while (($row = fgetcsv($handle)) !== FALSE) 
        {
           $cols = $row;
           if(count($cols)>0){
                break;
           }
        }
        return $cols;
    }else return false;
}

function analyse_file($file, $capture_limit_in_kb = 10) {
// capture starting memory usage
$output['peak_mem']['start']    = memory_get_peak_usage(true);

// log the limit how much of the file was sampled (in Kb)
$output['read_kb']                 = $capture_limit_in_kb;

// read in file
$fh = fopen($file, 'r');
    $contents = fread($fh, ($capture_limit_in_kb * 1024)); // in KB
fclose($fh);

// specify allowed field delimiters
$delimiters = array(
    'comma'     => ',',
    'semicolon' => ';',
    'tab'         => "\t",
    'pipe'         => '|',
    'colon'     => ':'
);

// specify allowed line endings
$line_endings = array(
    'rn'         => "\r\n",
    'n'         => "\n",
    'r'         => "\r",
    'nr'         => "\n\r"
);

// loop and count each line ending instance
foreach ($line_endings as $key => $value) {
    $line_result[$key] = substr_count($contents, $value);
}

// sort by largest array value
asort($line_result);

// log to output array
$output['line_ending']['results']     = $line_result;
$output['line_ending']['count']     = end($line_result);
$output['line_ending']['key']         = key($line_result);
$output['line_ending']['value']     = $line_endings[$output['line_ending']['key']];
$lines = explode($output['line_ending']['value'], $contents);

// remove last line of array, as this maybe incomplete?
array_pop($lines);

// create a string from the legal lines
$complete_lines = implode(' ', $lines);

// log statistics to output array
$output['lines']['count']     = count($lines);
$output['lines']['length']     = strlen($complete_lines);

// loop and count each delimiter instance
foreach ($delimiters as $delimiter_key => $delimiter) {
    $delimiter_result[$delimiter_key] = substr_count($complete_lines, $delimiter);
}

// sort by largest array value
asort($delimiter_result);

// log statistics to output array with largest counts as the value
$output['delimiter']['results']     = $delimiter_result;
$output['delimiter']['count']         = end($delimiter_result);
$output['delimiter']['key']         = key($delimiter_result);
$output['delimiter']['value']         = $delimiters[$output['delimiter']['key']];

// capture ending memory usage
$output['peak_mem']['end'] = memory_get_peak_usage(true);
return $output;
}
Emelyemelyne answered 30/3, 2014 at 13:2 Comment(2)
function analyse_file is taken from php.net/manual/fr/function.fgetcsv.php#101238Emelyemelyne
I managed to employ the MySQL Load Data method for CSV processing in my locator app. 100,00 locations loads in 75 seconds versus 10 hours with fget_csv() or file chunking. Definitely worth investigating if you are loading CSV directly into persistent data stores. Most SQL engines will have a similar function. Thanks Raza!Petrinapetrine
K
1

Normally, "Service Unavailable" error will come when 500 error occurs. I think this is coming because of insufficient execution time. Please check your log/browser console, may be you can see 500 error.

First of all, Keep set_time_limit(60) out of loop.

Do some changes like,

  1. Apply INDEX on user_email_id column, so you can get the rows faster with your select query.
  2. Do not echo message, Keep the output buffer free.

And

I have done these kind of take using Open source program. You can get it here http://sourceforge.net/projects/phpexcelreader/

Try this.

Kindhearted answered 8/7, 2013 at 6:33 Comment(2)
Yes i have seen 500 -internal server error on console & i have added code for increase max execution time also ini_set('memory_limit', '512M'); ini_set('max_execution_time', '180');Schmidt
set_time_limit(60) is in while loop, keep it outside of readCSV()Kindhearted

© 2022 - 2024 — McMap. All rights reserved.