file_get_contents getting wrong results
Asked Answered
F

7

11

Update

I solved the problem and posted an answer. However, my solution isn't 100% ideal. I would much rather only remove the symlink from the cache with clearstatcache(true, $target) or clearstatcache(true, $link) but that doesn't work.

I would also much rather prevent the caching of symlinks in the first place or remove the symlink from the cache immediately after generating it. Unfortunately, I had no luck with that. For some reason clearstatcache(true) after creating a symlink does not work, it still gets cached.

I will happily award the bounty to anyone that can improve my answer and solve those issues.

Edit

I've attempted to optimize my code by generating a file everytime clearstatcache is run, so that I only need to clear the cache once for each symlink. For some reason, this does not work. clearstatcache needs to be called every time a symlink is including in the path, but why? There must be a way to optimize the solution I have.


I am using PHP 7.3.5 with nginx/1.16.0. Sometimes file_get_contents returns the wrong value when using a symlink. The problem is after deleting and recreating a symlink, its old value remains in the cache. Sometimes the correct value is returned, sometimes the old value. It appears random.

I've tried to clear the cache or prevent caching with:

function symlink1($target, $link)
{
    realpath_cache_size(0);
    symlink($target, $link);
    //clearstatcache(true);
}

I don't really want to disable caching but I still need 100% accuracy with file_get_contents.

Edit

I am unable to post my source code, as it is way too long and complex, so I have created a minimal, reproducible example (index.php) that recreates the problem:

<h1>Symlink Problem</h1>
<?php
    $dir = getcwd();
    if (isset($_POST['clear-all']))
    {
        $nos = array_values(array_diff(scandir($dir.'/nos'), array('..', '.')));
        foreach ($nos as $no)
        {
            unlink($dir.'/nos/'.$no.'/id.txt');
            rmdir($dir.'/nos/'.$no);
        }
        foreach (array_values(array_diff(scandir($dir.'/ids'), array('..', '.'))) as $id)
            unlink($dir.'/ids/'.$id);
    }
    if (!is_dir($dir.'/nos'))
        mkdir($dir.'/nos');
    if (!is_dir($dir.'/ids'))
        mkdir($dir.'/ids');
    if (isset($_POST['submit']) && !empty($_POST['id']) && ctype_digit($_POST['insert-after']) && ctype_alnum($_POST['id']))
    {
        $nos = array_values(array_diff(scandir($dir.'/nos'), array('..', '.')));
        $total = count($nos);
        if ($total <= 100)
        {
            for ($i = $total; $i >= $_POST['insert-after']; $i--)
            {
                $id = file_get_contents($dir.'/nos/'.$i.'/id.txt');
                unlink($dir.'/ids/'.$id);
                symlink($dir.'/nos/'.($i + 1), $dir.'/ids/'.$id);
                rename($dir.'/nos/'.$i, $dir.'/nos/'.($i + 1));
            }
            echo '<br>';
            mkdir($dir.'/nos/'.$_POST['insert-after']);
            file_put_contents($dir.'/nos/'.$_POST['insert-after'].'/id.txt', $_POST['id']);
            symlink($dir.'/nos/'.$_POST['insert-after'], $dir.'/ids/'.$_POST['id']);
        }
    }
    $nos = array_values(array_diff(scandir($dir.'/nos'), array('..', '.')));
    $total = count($nos) + 1;
    echo '<h2>Ids from nos directory</h2>';
    foreach ($nos as $no)
    {
        echo ($no + 1).':'.file_get_contents("$dir/nos/$no/id.txt").'<br>';
    }
    echo '<h2>Ids from using symlinks</h2>';
    $ids = array_values(array_diff(scandir($dir.'/ids'), array('..', '.')));
    if (count($ids) > 0)
    {
        $success = true;
        foreach ($ids as $id)
        {
            $id1 = file_get_contents("$dir/ids/$id/id.txt");
            echo $id.':'.$id1.'<br>';
            if ($id !== $id1)
                $success = false;
        }
        if ($success)
            echo '<b><font color="blue">Success!</font></b><br>';
        else
            echo '<b><font color="red">Failure!</font></b><br>';
    }
?>
<br>
<h2>Insert ID after</h2>
<form method="post" action="/">
    <select name="insert-after">
        <?php
            for ($i = 0; $i < $total; $i++)
                echo '<option value="'.$i.'">'.$i.'</option>';
        ?>
    </select>
    <input type="text" placeholder="ID" name="id"><br>
    <input type="submit" name="submit" value="Insert"><br>
</form>
<h2>Clear all</h2>
<form method="post" action="/">
    <input type="submit" name="clear-all" value="Clear All"><br>
</form>
<script>
    if (window.history.replaceState)
    {
        window.history.replaceState( null, null, window.location.href );
    }
</script>

It seemed very likely to be a problem with Nginx configuration. Not having these lines can cause the problem:

fastcgi_param SCRIPT_FILENAME $realpath_root$fastcgi_script_name;
fastcgi_param DOCUMENT_ROOT $realpath_root;

Here is my Nginx configuration (you can see I have included the above lines):

server {
    listen 443 ssl http2;
    listen [::]:443 ssl http2;
    server_name www.websemantica.co.uk;
    root "/path/to/site/root";
    index index.php;

    location / {
        try_files $uri $uri/ $uri.php$is_args$query_string;
    }

    location ~* \.php$ {
        try_files $uri =404;
        fastcgi_pass unix:/var/run/php-fpm/php-fpm.sock;
        fastcgi_param   QUERY_STRING            $query_string;
        fastcgi_param   REQUEST_METHOD          $request_method;
        fastcgi_param   CONTENT_TYPE            $content_type;
        fastcgi_param   CONTENT_LENGTH          $content_length;

        fastcgi_param   SCRIPT_FILENAME         $realpath_root$fastcgi_script_name;
        fastcgi_param   SCRIPT_NAME             $fastcgi_script_name;
        fastcgi_param   PATH_INFO               $fastcgi_path_info;
        fastcgi_param   PATH_TRANSLATED         $realpath_root$fastcgi_path_info;
        fastcgi_param   REQUEST_URI             $request_uri;
        fastcgi_param   DOCUMENT_URI            $document_uri;
        fastcgi_param   DOCUMENT_ROOT           $realpath_root;
        fastcgi_param   SERVER_PROTOCOL         $server_protocol;

        fastcgi_param   GATEWAY_INTERFACE       CGI/1.1;
        fastcgi_param   SERVER_SOFTWARE         nginx/$nginx_version;

        fastcgi_param   REMOTE_ADDR             $remote_addr;
        fastcgi_param   REMOTE_PORT             $remote_port;
        fastcgi_param   SERVER_ADDR             $server_addr;
        fastcgi_param   SERVER_PORT             $server_port;
        fastcgi_param   SERVER_NAME             $server_name;

        fastcgi_param   HTTPS                   $https;

        # PHP only, required if PHP was built with --enable-force-cgi-redirect
        fastcgi_param   REDIRECT_STATUS         200;

        fastcgi_index index.php;
        fastcgi_read_timeout 3000;
    }

    if ($request_uri ~ (?i)^/([^?]*)\.php($|\?)) {
        return 301 /$1$is_args$args;
    }
    rewrite ^/index$ / permanent;
    rewrite ^/(.*)/$ /$1 permanent;
}

Currently I have the above example live at https://www.websemantica.co.uk.

Try adding a few values in the form. It should display Success! in blue every time. Sometimes is shows Failure! in red. It may take quite a few page refreshes to change from Success! to Failure! or vice-versa. Eventually, it will show Success! every time, therefore there must be some sort of caching problem.

Friedrick answered 5/11, 2019 at 13:15 Comment(4)
I'm was looking around same case and found very useful comment on realpath function page. Maybe it could helps you.Viquelia
@Viquelia I tried using realpath with file_get_conents and no luck. It still sometimes loads from the cache.Friedrick
I mean not only realpath, but something like clearstatcache(true); file_get_conents(realpath($fileName));Viquelia
Try linux.die.net/man/8/updatedb run the command between consecutive calls. Although I am not sure how to solve the issue in php if this is the case.Mysterious
F
1

There were two issues that caused the problem.

First issue

I already posted as and edit in the question. It's a problem with the Nginx configuration.

These lines:

fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_param DOCUMENT_ROOT $document_root;

needed replaced with:

fastcgi_param SCRIPT_FILENAME $realpath_root$fastcgi_script_name;
fastcgi_param DOCUMENT_ROOT $realpath_root;

Second issue

The second issue was I needed to call clearstatcache before calling file_get_contents. I only want to call clearstatcache when it's absolutely necessary, so I wrote a function that only clears the cache when the directory includes a symlink.

function file_get_contents1($dir)
{
    $realPath = realpath($dir);
    if ($realPath === false)
        return '';
    if ($dir !== $realPath)
    {
        clearstatcache(true);
    }
    return file_get_contents($dir);
}
Friedrick answered 27/11, 2019 at 14:15 Comment(0)
P
3

It's too much depend on OS level. So how about try to think out the box. How about try to read the real location of file by readlink, and use that real location path ?

$realPath = shell_exec("readlink " . $yourSymlink);
$fileContent = file_get_contents($realPath);
Postnatal answered 22/11, 2019 at 8:28 Comment(2)
I don't think that's enough out (of box), after all, readlink also depends on OS level calls and is affected by the cache.Campagna
Just tested it in PHP: $link = readlink('info.json'); The link stayed the same (cached). The content however read on the next line changed $data = json_decode(file_get_contents('info.json'))); - requested via Ajax every 3s - around 15x => (45s total) the content switched between the old and the new file. The returned $link was (wrongly) the same.Sartre
B
3

This is the desired behavior of PHP you can see this here because PHP uses realpath_cache to stores the file paths due to performance enhancements so that it can reduce Disk Operations.

In order to avoid this behavior maybe you can try to clear the realpath_cache before using the get_file_contents function

You can try something like this:


clearstatcache();
$data = file_get_contents("Your File");

You can read more for clearstatcache on PHP doc.

Baptista answered 26/11, 2019 at 9:15 Comment(0)
C
2

There are two caches.

First the OS cache and then the PHP cache.

In most of the cases clearstatcache(true) before file_get_contents(...) does the job.

But sometimes you also need to clear the OS cache. In case of Linux, there I can think of two places to clear. PageCache (1) and dentries/inodes (2).

This clears both:

shell_exec('echo 3 > /proc/sys/vm/drop_caches')

Note: This is good for troubleshooting but not for frequent calls in production as it clears the whole OS cache and costs the system a few moments of cache re-population.

Campagna answered 8/11, 2019 at 3:53 Comment(3)
This doesn't work, it still sometimes loads the cached value and I need a solution that is good for frequent calls in production.Friedrick
@DanBray, could you log things to find out more about the nature of sometimes?Campagna
@DanBray, And how do you detect the appearance of the old value? Could it be that your test returns the old value due to other test conditions while the value there has really changed?Campagna
C
2

"The problem is after deleting and recreating a symlink"

How do you delete the symlink? Deleting a file (or a symlink) should automatically clear the cache.

Otherwise, you could see what happens if you do:

// This has "race condition" written all around it
unlink($link);
touch($link);
unlink($link); // Remove the empty file
symlink($target, $link);

If this does not solve the problem, could it perhaps be a problem with nginx as in this issue?

Try logging all operations to a log file, to see what actually happens.

or maybe...

...could you do without symlinks altogether? For example, store in a database, memcache, SQLite file, or even a JSON file the mapping between "filename" and "actual symlink target". Using e.g. redis or other keystores, you could associate the "filename" with the real symlink target and bypass the OS resolution completely.

Depending on the use case, this might even turn out to be faster than using symlinks.

Circulation answered 13/11, 2019 at 22:44 Comment(3)
I could not see how this may relate to nginx as there seem no http thing between php process and the local file system. Is being the parent process makes nginx somehow relevant?Campagna
@BahramArdalan the fact is, we do not know how the problem was diagnosed or what the symlinks are or how they are used. So it is conceivable that the content mismatch was detected downstream from nginx, and could actually be unrelated to PHP. A SCCCE would be of great help.Circulation
Yes. We have to dig a little bit into that "how" thing.Campagna
F
1

There were two issues that caused the problem.

First issue

I already posted as and edit in the question. It's a problem with the Nginx configuration.

These lines:

fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_param DOCUMENT_ROOT $document_root;

needed replaced with:

fastcgi_param SCRIPT_FILENAME $realpath_root$fastcgi_script_name;
fastcgi_param DOCUMENT_ROOT $realpath_root;

Second issue

The second issue was I needed to call clearstatcache before calling file_get_contents. I only want to call clearstatcache when it's absolutely necessary, so I wrote a function that only clears the cache when the directory includes a symlink.

function file_get_contents1($dir)
{
    $realPath = realpath($dir);
    if ($realPath === false)
        return '';
    if ($dir !== $realPath)
    {
        clearstatcache(true);
    }
    return file_get_contents($dir);
}
Friedrick answered 27/11, 2019 at 14:15 Comment(0)
B
1

I am leaving my first answer since it is still a valid answer. I am improving @DanBray answer by implementing clearstatcache(true,$filename).

There were two issues that caused the problem.

First issue

I already posted as and edit in the question. It's a problem with the Nginx configuration.

These lines:

fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; fastcgi_param DOCUMENT_ROOT $document_root;

needed replaced with:

fastcgi_param SCRIPT_FILENAME $realpath_root$fastcgi_script_name; fastcgi_param DOCUMENT_ROOT $realpath_root;

Second issue

The second issue was I needed to call clearstatcache before calling file_get_contents. I only want to call clearstatcache when it's absolutely necessary, so I wrote a function that only clears the cache when the directory includes a symlink.

function file_get_contents1234_hard_drives($dir_go_1){
    $realPath = realpath($dir_go_1);
        $myDirectory=opendir(dirname($realPath));        
        while($entryName=readdir($myDirectory)) {
          $dirArray[]=$entryName;
        }

        /* Finds extensions of files used for my site theelectronichandbook.tech
        function findexts ($filename) {
          $filename=strtolower($filename);
          $exts=split("[/\\.]", $filename);
          $n=count($exts)-1;
          $exts=$exts[$n];
          return $exts;
        }*/

        // Closes directory
        closedir($myDirectory);

        // Counts elements in array
        $indexCount=count($dirArray);
        for($ArPos=1;$ArPos<=$indexCount;$ArPos++){
            /*used for my site theelectronichandbook.tech
            if($_SERVER['QUERY_STRING']=="hidden"){
                $H="";
                $af="./";
                $atext="Hide";
            }else{
                $H=".";
                $af="./?hidden";
                $at="Show";
            }*/
            if(strpos($dirArray[$ArPos], "Symlink") !== false){
                clearstatcache(true,$dir_go_1);
            }
        }
    return file_get_contents($dir_go_1);
}

I Tested the above code with my web-server and it worked.

Billfish answered 29/11, 2019 at 17:16 Comment(5)
Unfortunately, it does not work for me on my web-server.Friedrick
Well I will go back to the drawling board. @DanBrayBillfish
Thank you very much, but unfortunately, there is very little time before the bounty period expires. However, if you think of solution I am 100% happy with, I will award an extra bounty. Also, file_get_contents1 is part of the framework I have made, so it's used a lot, which makes optimization important.Friedrick
That may need changed toWhile($dir_go!==null) @DanBrayBillfish
Let us continue this discussion in chat.Billfish
B
0

Try placing the code inside a element that is continuously refreshing using Jquery as well as forcing revalidation and clearing static catch. This code has been modified from @naveed original answer.

form.php:

 <meta http-equiv="Cache-Control" content="no-store, must-revalidate" />
 <meta http-equiv="Expires" content="0"/>
 <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.12.4/jquery.min.js"></script>
 <script> 
 jQuery(document).ready(function(){
    jQuery('.ajaxform').submit( function() {
        $.ajax({
            url     : $(this).attr('action'),
            type    : $(this).attr('method'),
            dataType: 'json',
            data    : $(this).serialize(),
            success : function( data ) {
                        // loop to set the result(value)
                        // in required div(key)
                        for(var id in data) {
                            jQuery('#' + id).html( data[id] );
                        }
                      }
        });
        return false;
    });
});
var timer, delay = 30;
timer = setInterval(function(){
    $.ajax({
      type    : 'POST',
      url     : 'profile.php',
      dataType: 'json',
      data    : $('.ajaxform').serialize(),
      success : function(data){
                  for(var id in data) {
                    jQuery('#' + id).html( data[id] );
                  }
                }
    }); }, delay);
 </script>
 <form action='profile.php' method='post' class='ajaxform'></form>
 <div id='result'></div>

profile.php:

 <?php
       // All form data is in $_POST
       // Now perform actions on form data here and create an result array something like this
       clearstatcache();
       $arr = array( 'result' => file_get_contents("./myfile.text") );
       echo json_encode( $arr );
 ?>
Billfish answered 14/11, 2019 at 15:45 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.