Is file_get_contents & file_put_contents reliable or can lead to loss of data? Benchmark results
Asked Answered
T

2

0

I was wondering what happens if multiple scripts are sharing same file. I uploaded the test on remote server, where they use HDD to store data. There were 7 tests total, but the family of 6 are compatible.

I have 7 files of different size which I uploaded to server and the test. It is loop which reads and writes data from the files.

There is 50 microseconds delay in the loop. The loop repeats 50x.

I measure the time needed to perform every circle.

The differences in the tests (T):

Using file_get_contents/file_put_contents

T2 - SOURCE <> TARGET - reads data from original file, writes data do different (new) file

T3 - SOURCE = TARGET - 1. copies data from original file to target; 2. reads source data -> writes data; 3. the point 3 is repeated: i.e I read the data which I have written. This test uses same file to write data.

T4 - SOURCE = TARGET - I repeated the same test as in T3 getting shorted times.

Using fopen, flock, fread, flock, fclose, fopen, flock, fopen, fwrite, fflush, fclock, fclose ... This is complicated code, but here I have tested the fflush. I also use clearstatcache, stat and touch and clearstatcache, filesize. To check validity. The tests T5 - T7 were less reliable than T2-T4 because sometimes the write operation failed. I tested the file size and when it was not correct, I copied (restored) the file back from original file.

T5: (fflush) SOURCE = TARGET

T6: (fflush) SOURCE <> TARGET

T7: (fflush) SOURCE <> TARGET + I have removed the 50 microseconds delay from the loop (It seems like the validity/reliability is worse when there is a delay).

I made 4 requests from 4 different browsers - so every test have 4 sets of data (7*50*4 values total).

Now I have collected all data, created tables and diagrams. This is one diagram of many, showing minimal and maximal values of avarage value.

T4 yellow color and T3 green provides very small times so they are suspicious. For example T4 avarage times are these: 0,001

0.001 0.002 0.003 0.002 0.004 0.003 0.004 0.001 0.004 0.001 0.004 0.001 0.004

And T3 times:

0.002 0.003 0.001 0.001 0.003 0.003 0.006 0.007 0.002 0.003 0.004 0.004 0.019 0.019

The values of T2 seems normal, but this can be explained by the fact, that that was read from different file than was written to.

T5-T7 just show normal times as expected - the bigger the file the bigger the time needed to process. Fairly slow as expected from HDD and 4 scripts running at the same time.

So my question here is:

Does the results of T3-T4 mean, that the file_read_contents and file_put_contents are not reliable for this type of job? To me it looks like they simply do not read the data from file but they are copied from buffer, which means, that old data are saved, not the current data been changed by concurent script. I would welcome more information. I spent a lot of time searching for answers but did not found clear answer. I did this tests because I need proofs. You man want to use my scripts but I am not sure if can I paste here the 6 scripts? Now I will add just the fflush test number 7 which is most useful.

<?PHP 
clearstatcache();
$_DEBUG_ = false;

echo "Lock and flush tester.".time()."<br>";
die;

while ( time()<1570787996 )
 {
 usleep(500);
 }


function test($n, $p, $_DEBUG_){
  $sname = "$n";    // source
  $tname = "$n.txt";// target
  echo "<h4>$n at ".time()."</h4>";
  for ($i = 0; $i<50; $i++ ){
    $start = microtime(true);
    clearstatcache(); // needed for filesize and touch    
    $st = stat("$sname");
    $original_size = $st['size'];
    if ( $_DEBUG_ )
      echo "; 1) prevAccess by ".$st['mtime']." fsize ".$st['size']."; ";
    $fsize = filesize($sname);
    if ( $original_size <> $fsize )
      die("; fsize total FAILTURE; ");
    if ($fsize === 0)
     echo "! <b>The fsize is 0</b>: stat(): ".$st['size']." ;";    
    else
      {
      // READ OPERATION AND LOCK FOR SHARE
       $locked = false;     
       for ($c = 0; !$locked; $c++):      
         if ( $c > 400)
           break;
         $fp = fopen($sname, "r");
         $locked = flock($fp, LOCK_SH);
         if ($locked)
           break;
         else
           {
           echo "failed to get LOCK_SH;<br>";
           usleep(5000);
           }
       endfor;
       $s = fread($fp, $fsize );
       $success = flock($fp, LOCK_UN);
       if ( $success === false  )
         die("; r flock release failed; ");
       $success = fclose($fp);
       if ( $success === false  )
         die("; fclose failed; ");
       // 10 - data loaded , $p - browser
       if ( $success )
         { 
         $result = touch("$sname",strlen($s),$p);
         if ( $_DEBUG_ )
            echo "; TOUCH: $result;";
         }
       else
         die("fclose FAIL.");
       if ( strlen($s)<60 ) 
          echo "*$s LENGTH:".strlen($s)."<br>";
      }
    clearstatcache();
    $st = stat("$tname");                               
    if ( $_DEBUG_ )
      echo "; 2) prevAccess by ".$st['mtime']." fsize is ".$fsize."; ";

    // WRITE OPERATION WITH LOC_EX
    $fp = fopen($tname, "w");
    $locked = false; 
    $locked = flock($fp, LOCK_EX);
    if ( $locked ) {  // acquire an exclusive lock
        $success = fwrite($fp, $s);
        if ( $success === false)
          echo "; w FAILED;";
        else
          if ( $_DEBUG_ )
                echo " $success B written; ";
        $success = fflush($fp);// flush output before releasing the lock
        if ( $success === false ) 
          echo "; flush FAILED; ";
        $success = flock($fp, LOCK_UN);    // release the lock
        if ( $success === false ) 
          echo "; release FAILED; ";
        $success = fclose($fp);
        if ( $success === false ) 
          echo "; fclose FAILED; ";
        clearstatcache(); // needed for filesize and touch
        $fsize = filesize($tname);
        if ($original_size>$fsize)
            {
            echo "; <b>WRITE FAILED, restoring</b>;";
            $original_fname = "$n";
            $result = copy($original_fname, $tname);
            if ($result == false )
              die(" <b>TOTAL FAILTURE: copy failed.</b>");
            else
              echo " <b>RESTORED</b>;";
            }
        else
        {
          if ($fsize === 0)
           echo "! THE FILE WAS NOT WRITTEN: data length: ".strlen($s)." fsize: $fsize RESOURCE: $fp<br>";    
          if ( $success ) 
              touch("$tname",$fsize,$p);
        }
    } else {
        echo "Couldn't get the lock!";
    }
     $time_elapsed_secs = microtime(true) - $start;
     if ( $time_elapsed_secs === 0 )
       echo " FAILED ";
    echo "time: $time_elapsed_secs s<br>"; 
  }
}

switch ( $_SERVER['HTTP_USER_AGENT'] ):
  // FF 1:
  case "Mozilla/5.0 (Windows NT 5.1; rv:49.0) Gecko/20100101 Firefox/49.0": 
    $p = 1; break;
  // Chrome:
  case "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36":
    $p = 2; break;
  // OPERA:
  case "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36 OPR/36.0.2130.80":  
    $p = 3; break;
endswitch;

copy("523","523.txt");
copy("948","948.txt");
copy("1371","1371.txt");
copy("1913","1913.txt");
copy("2701","2701.txt");
copy("4495","4495.txt");
copy("6758","6758.txt");

test("523",$p,$_DEBUG_);
test("948",$p,$_DEBUG_);
test("1371",$p,$_DEBUG_);
test("1913",$p,$_DEBUG_);
test("2701",$p,$_DEBUG_);
test("4495",$p,$_DEBUG_);
test("6758",$p,$_DEBUG_);
die;
echo "php: " . phpversion();
?>
<?PHP echo "php: " . phpinfo();
?>

You may want to enable $DEBUG option to monitor each process. Note: The touch maybe do not work correctly always.

Note: This is not a request for test, this is just request for review.

file_put_contents vs fflush

Also: Please do not be confused by the yellow color curve. There are two yellow colors. The T4 yellow is almost no visible on the diagram because it has very low values.

Tideland answered 12/10, 2019 at 7:19 Comment(0)
C
4

I don't know what you're trying to do, but I'm afraid you've gone the wrong way. If you are concerned about a collision, you should use a database that takes care of such problems and offers you luxury access methods. PHP comes with 5 different databases that you can choose from.

Notice, there is not a collision between these two functions, both are atomic and reliable. The problem is if you read, modify, and save the file. These three actions are not in one transaction and therefore you may lose data when you overlap. If you need such a use case, use the database.

Buffering is a basic file system feature that every programmer should know. This applies to all programming languages, not just PHP.

Realize that you are actually trying to create a database engine, that is, inventing the wheel. Many databases look like a plain text file, but the engine above them is ready and tested. Why don't you use any of the five?

Corrie answered 12/10, 2019 at 18:53 Comment(0)
T
0

I would like to add one more test. This one was made using a "directory lock". Instead of using flock, this creates directory. If the directory does not exists it attempts to create one and continues to read and write data. Notice: this is not perfect solution. The loop has 50 cicles. No delay. But the function atomicFuse has delay. I post this not as real solution, but just as a test and the result of the test for comparation.

/*
n is file size in kB
c is counter for optimalization
first call must have c = 0;
*/
function atomicFuse($n, $c, $disableDelay = false){
  $start = false;
  if ( !file_exists("$n.t") ) 
   $start = mkdir("$n.t");
  if ( !$disableDelay ){
    if ( $start == false )
     {
     $n = $n*30;
     switch($c):      // Delay example increase:
       case 0: break; // 0,01569 total
       case 1: break; // 0,03138 total
       case 2: $n = $n*2; break; // 0,06276 total
       case 3: $n = $n*4; break; // 0,12552 total
       // case 4: You need at least *6 or *8 to get out of problems with extrem times
       case 4: $n = $n*8; break; // 0,25104 t.(upper limit)
       // In case of heavy traffic:
       case 5: $n = $n*8; break; // 0,36087 total extrem
       case 6: $n = $n*10; break; // 0,51777 total extrem
       case 7: $n = $n*20; break; // 1,03554 total extrem
       default: $n = $n*8; break;
     endswitch;
     usleep($n);
     echo ($n)."<br>";
     }
    }
  return $start;
}

Implementation of the atomicFuse:

  for ($i = 0; $i<50; $i++ ){
    $start_time = microtime(true);
      {
      $start = atomicFuse($n,0);
      if (!$start) $start = atomicFuse($n,1);
      if (!$start) $start = atomicFuse($n,2);
      if (!$start) $start = atomicFuse($n,3);
      if (!$start) $start = atomicFuse($n,4);
      if (!$start) $start = atomicFuse($n,5);
      if (!$start) $start = atomicFuse($n,6);
      if (!$start) $start = atomicFuse($n,7);
      if (!$start) $start = atomicFuse($n, false);
      if (!$start) echo "<b>Atomicity failed.</b> ";
      if ( $start )
         {
         // do some action
         $success = rmdir("$n.t"); // remove atomic fuse
         }
      } 
    }

The T8 results min, max of average:

0.006 0.083 0.018 0.156 0.072 0.182 0.100 0.255 0.168 0.276 0.224 0.383 0.224 0.406

Important notice: This test is very specific. It has some atomic failtures, so in the begin of some section are big delays.

So every request made by specific browser on my PC lead to these errors: request from Chrome: 6 failed (4x 523kB and 2x 948kB) request from FF1: 5 failed (first 5 files 523kB) request from Opery: 0 failed (100% OK) request from FF2: 0 failed (100% OK)

T8 test - black line

I will add yet one more diagram, without the values where test failed. That will be completely different.

Another diagram with T8b, I have removed the very high numbers from the begin of function start. This changes average just very slightly.

The high numbers changed down

Tideland answered 13/10, 2019 at 14:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.