file_get_contents => PHP Fatal error: Allowed memory exhausted
Asked Answered
J

4

34

I have no experience when dealing with large files so I am not sure what to do about this. I have attempted to read several large files using file_get_contents ; the task is to clean and munge them using preg_replace().

My code runs fine on small files ; however, the large files (40 MB) trigger an Memory exhausted error:

PHP Fatal error:  Allowed memory size of 16777216 bytes exhausted (tried to allocate 41390283 bytes)

I was thinking of using fread() instead but I am not sure that'll work either. Is there a workaround for this problem?

Thanks for your input.

This is my code:

<?php
error_reporting(E_ALL);

##get find() results and remove DOS carriage returns.
##The error is thrown on the next line for large files!
$myData = file_get_contents("tmp11");
$newData = str_replace("^M", "", $myData);

##cleanup Model-Manufacturer field.
$pattern = '/(Model-Manufacturer:)(\n)(\w+)/i';
$replacement = '$1$3';
$newData = preg_replace($pattern, $replacement, $newData);

##cleanup Test_Version field and create comma delimited layout.
$pattern = '/(Test_Version=)(\d).(\d).(\d)(\n+)/';
$replacement = '$1$2.$3.$4      ';
$newData = preg_replace($pattern, $replacement, $newData);

##cleanup occasional empty Model-Manufacturer field.
$pattern = '/(Test_Version=)(\d).(\d).(\d)      (Test_Version=)/';
$replacement = '$1$2.$3.$4      Model-Manufacturer:N/A--$5';
$newData = preg_replace($pattern, $replacement, $newData);

##fix occasional Model-Manufacturer being incorrectly wrapped.
$newData = str_replace("--","\n",$newData);

##fix 'Binary file' message when find() utility cannot id file.
$pattern = '/(Binary file).*/';
$replacement = '';
$newData = preg_replace($pattern, $replacement, $newData);
$newData = removeEmptyLines($newData);

##replace colon with equal sign
$newData = str_replace("Model-Manufacturer:","Model-Manufacturer=",$newData);

##file stuff
$fh2 = fopen("tmp2","w");
fwrite($fh2, $newData);
fclose($fh2);

### Functions.

##Data cleanup
function removeEmptyLines($string)
{
        return preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $string);
}
?>
Jp answered 9/3, 2011 at 16:56 Comment(6)
You may be able to fetch it in chunks using fread() but whether that will help you depends on what kind of operations you do on it, and what you do with the result.Simonsen
Hey Chris. There is an attribute within the php.ini file that handles the file/memory size. If I recall, you can change the number to increase the size. This would/should allow you to handle larger files.Slay
@tom smith: it is not my server, sadly so my hands are tied.Jp
@peka: preg_replace() and str_replace() operations on the file. Then, save the changes to a new file.Jp
You can boost the memory limit at run-time with ini_set('memory_limit', $new_limit_here);, though some servers do not allow this override. But slurping an entire 40meg file into memory is generally not a good idea. Processing it line-by-line would be easier for the most part.Gman
Chris, while I can assume what the file format is from your code above, could you post a couple lines of it so that I can examine it side by side instead?Steddman
T
111

Firstly you should understand that when using file_get_contents you're fetching the entire string of data into a variable, that variable is stored in the hosts memory.

If that string is greater than the size dedicated to the PHP process then PHP will halt and display the error message above.

The way around this to open the file as a pointer, and then take a chunk at a time. This way if you had a 500MB file you can read the first 1MB of data, do what you will with it, delete that 1MB from the system's memory and replace it with the next MB. This allows you to manage how much data you're putting in the memory.

An example if this can be seen below, I will create a function that acts like node.js

function file_get_contents_chunked($file,$chunk_size,$callback)
{
    try
    {
        $handle = fopen($file, "r");
        $i = 0;
        while (!feof($handle))
        {
            call_user_func_array($callback,array(fread($handle,$chunk_size),&$handle,$i));
            $i++;
        }

        fclose($handle);

    }
    catch(Exception $e)
    {
         trigger_error("file_get_contents_chunked::" . $e->getMessage(),E_USER_NOTICE);
         return false;
    }

    return true;
}

and then use like so:

$success = file_get_contents_chunked("my/large/file",4096,function($chunk,&$handle,$iteration){
    /*
        * Do what you will with the {$chunk} here
        * {$handle} is passed in case you want to seek
        ** to different parts of the file
        * {$iteration} is the section of the file that has been read so
        * ($i * 4096) is your current offset within the file.
    */
    
});

if(!$success)
{
    //It Failed
}

One of the problems you will find is that you're trying to perform regex several times on an extremely large chunk of data. Not only that but your regex is built for matching the entire file.

With the above method your regex could become useless as you may only be matching a half set of data. What you should do is revert to the native string functions such as

  • strpos
  • substr
  • trim
  • explode

for matching the strings, I have added support in the callback so that the handle and current iteration are passed. This will allow you to work with the file directly within your callback, allowing you to use functions like fseek, ftruncate and fwrite for instance.

The way you're building your string manipulation is not efficient whatsoever, and using the proposed method above is by far a much better way.

Tenure answered 9/3, 2011 at 17:48 Comment(8)
thank you so much for such detailed answer! I am a beginner and answers like yours motivate me to work harder. Thanks again.Jp
The file searched appears to be a multi-line file. Wouldn't it be easier to use fgets to process line by line?Darmstadt
just replace fread by fgets and 4096 with 1024 for line by line processing.Untried
it is returning whole row as string, what to do to return array insteadOleate
This method has been working for files that are less than 3mb in size. I tested this out with a CSV file that contains 30,000 rows. Even after using ini_set to change the allotted memory I can't use fopen to open a file that large. This may be an issue with the file not being posted from my form, however. I get a filename must exist error message. Do you have any suggestions?Quadruplex
How can i use this for json file .?I want to decode the json data into array . But i can only decode the json after it complete .Any idea?Simpson
for testing, you can add a limit. file_get_contents_chunked($file, $chunk_size, $callback, $limit = null){ ... then, inside the while loop, if ($limit !== null && $i > $limit) break;Grosvenor
what you mean By 4096Rothberg
K
2

A pretty ugly solution to adjust your memory limit depending on file size:

$filename = "yourfile.txt";
ini_set ('memory_limit', filesize ($filename) + 4000000);
$contents = file_get_contents ($filename);

The right solutuion would be to think if you can process the file in smaller chunks, or use command line tools from PHP.

If your file is line-based you can also use fgets to process it line-by-line.

Kopeck answered 9/3, 2011 at 17:0 Comment(11)
poor answer, if you do this with your applications then you need to go back to the basics!Tenure
@Tenure I said it was pretty ugly, but it was the only solution. The OP obviously didn't give any indication that the file could be processed in smaller chunks. And you willfully ignore the sentence starting with "The right solutuion" just to bash someone. Gret job.Kopeck
But its not a solution, a solution is where you solve something, this wold produce the exact same result as he had in the first place.Tenure
@Tenure "this wold produce the exact same result as he had in the first place" - Certainly not. His prblem was that a 16MByte limit was in the way of working with a 40MByte file. Would dynamically increasing the memory limit over the filesize produce "the exact same result"? No way. Was my recommended solution chunked processing, the very same thing you recommended? Yes it was. If this was not verbous enough the OP could have asked for any of the details. But not writing 6 paragraphs will not make an answer false.Kopeck
Im half asleep here, that comment was meant for the other post, it was wrong of me to post it under your solution, what if you had 100MB and 20 users hit your site at the same time, thats 20 threads in the CPU, 100MB in the ram each, thats 1GB ram used already. then the latter will fail because theres not enough ram for the request. its a solution yes, but a very poor one.Tenure
We simply don't know what the purpose of the script is. You can get numbers out of thin air and base concusions on them. Did I recommend that solution to a high traffic environment? Or did I really recommend at all to any environment? No. I recommended chunked processing as a "real solution". Please read my second paragraph out of the three.Kopeck
so why post something if you do not recommend ?Tenure
Because it is also part of the spectrum. It is also a possible solution. If the OP needs a solution which works NOW instead of the time it would take to rewrite the current processing model. If the script is (as I'm sure the case is) used to process a batch of data on the backend. - I have written that it is an ugly solution, I have written what do I recommend instead. I let the poster decide if he uses it or not. Don't think it was misleading in any aspect. It also has a point the other answers lack which is line-by-line processing with fgets. But hey, let's vote it down.. why not?Kopeck
@PapaDeBeau Good to see it helps someone after my tarring and feathering by the SO mob. :)Kopeck
@Tenure I think vbence is right you don't know the context so you don't know if it's a bad solution. In my case, I'm writing a quick command line script to manipulate some data. That data is JSON in a very very large file. This solution is exactly what I needed and there's no concern as long as my machine won't run out of memory to run the script once.Beitnes
My solution allows you to control the amount of memory used for the operation, which in turns gives you greater confidence of running the same code in any environment where you do not know what resources you have. This will work, but only on machines you know what memory resources you have available.Tenure
D
2

For processing just n numbers of rows at a time, we can use generators in PHP.

n(use 1000)

This is how it works Read n lines, process them, come back at n+1, then read n lines, process them come back and read next n lines and so on.

Here's the code for doing so.

<?php
class readLargeCSV{

    public function __construct($filename, $delimiter = "\t"){
        $this->file = fopen($filename, 'r');
        $this->delimiter = $delimiter;
        $this->iterator = 0;
        $this->header = null;
    }

    public function csvToArray()
    {
        $data = array();
        while (($row = fgetcsv($this->file, 1000, $this->delimiter)) !== false)
        {
            $is_mul_1000 = false;
            if(!$this->header){
                $this->header = $row;
            }
            else{
                $this->iterator++;
                $data[] = array_combine($this->header, $row);
                if($this->iterator != 0 && $this->iterator % 1000 == 0){
                    $is_mul_1000 = true;
                    $chunk = $data;
                    $data = array();
                    yield $chunk;
                }
            }
        }
        fclose($this->file);
        if(!$is_mul_1000){
            yield $data;
        }
        return;
    }
}

And for reading it, you can use this.

    $file = database_path('path/to/csvfile/XYZ.csv');
    $csv_reader = new readLargeCSV($file, ",");


    foreach($csv_reader->csvToArray() as $data){
     // you can do whatever you want with the $data.
    }

Here $data contains the 1000 entries from the csv or n%1000 which will be for the last batch.

A detailed explanation for this can be found here https://medium.com/@aashish.gaba097/database-seeding-with-large-files-in-laravel-be5b2aceaa0b

Dagmar answered 23/7, 2020 at 16:18 Comment(0)
S
0

My advice would be to use fread. It may be a little slower, but you won't have to use all your memory... For instance :

//This use filesize($oldFile) memory
file_put_content($newFile, file_get_content($oldFile));
//And this 8192 bytes
$pNew=fopen($newFile, 'w');
$pOld=fopen($oldFile, 'r');
while(!feof($pOld)){
    fwrite($pNew, fread($pOld, 8192));
}
Sharronsharyl answered 9/3, 2011 at 17:10 Comment(5)
My understanding is that the OP does not want to copy the file, he wants to process it with preg_replace.Kopeck
Ok, then I guess he can still do this between the fread & the fwrite ;)Sharronsharyl
@Kopeck & @haltabush: preg_replace() and str_replace() operations on the file do work fine on small files. Please see my updated post for my code. fread() seems the way to go.Jp
file_put_content($newFile, file_get_content($oldFile)); makes NO difference what so ever, its still reading the entire file into the memory! -1Tenure
@RoberPitt : true, it was just what I thought Chris didSharronsharyl

© 2022 - 2024 — McMap. All rights reserved.