php glob - scan in subfolders for a file
Asked Answered
A

4

55

I have a server with a lot of files inside various folders, sub-folders, and sub-sub-folders.

I'm trying to make a search.php page that would be used to search the whole server for a specific file. If the file is found, then return the location path to display a download link.

Here's what i have so far:

$root = $_SERVER['DOCUMENT_ROOT'];
$search = "test.zip";
$found_files = glob("$root/*/test.zip");
$downloadlink = str_replace("$root/", "", $found_files[0]);
if (!empty($downloadlink)) {
    echo "<a href=\"http://www.example.com/$downloadlink\">$search</a>";
} 

The script is working perfectly if the file is inside the root of my domain name... Now i'm trying to find a way to make it also scan sub-folders and sub-sub-folders but i'm stuck here.

Assiduous answered 18/6, 2013 at 4:44 Comment(4)
#8871231Cule
You mind have better luck using the file_exists() function. php.net/manual/en/function.file-exists.php (or a mix of).Giverin
doesn't tells me how to scan all sub-folders and sub-sobfolders for the file...Assiduous
True. Have you had a look at the link messi fan put up? Seems promising. I'm dabbling with it now, and it's showing me all files in starting folder and sub-folders, but not working the way you want it to. Plus, I've got both eyes in the same socket right; needing some sleep, very soon.Giverin
P
96

There are 2 ways.

Use glob to do recursive search:

<?php
 
// Does not support flag GLOB_BRACE
function rglob($pattern, $flags = 0) {
    $files = glob($pattern, $flags); 
    foreach (glob(dirname($pattern).'/*', GLOB_ONLYDIR|GLOB_NOSORT) as $dir) {
        $files = array_merge(
            [],
            ...[$files, rglob($dir . "/" . basename($pattern), $flags)]
        );
    }
    return $files;
}

// usage: to find the test.zip file recursively
$result = rglob($_SERVER['DOCUMENT_ROOT'] . '/test.zip');
var_dump($result);
// to find the all files that names ends with test.zip
$result = rglob($_SERVER['DOCUMENT_ROOT'] . '/*test.zip');
?>

Use RecursiveDirectoryIterator

<?php
// $regPattern should be using regular expression
function rsearch($folder, $regPattern) {
    $dir = new RecursiveDirectoryIterator($folder);
    $ite = new RecursiveIteratorIterator($dir);
    $files = new RegexIterator($ite, $regPattern, RegexIterator::GET_MATCH);
    $fileList = array();
    foreach($files as $file) {
        $fileList = array_merge($fileList, $file);
    }
    return $fileList;
}

// usage: to find the test.zip file recursively
$result = rsearch($_SERVER['DOCUMENT_ROOT'], '/.*\/test\.zip/'));
var_dump($result);
?>

RecursiveDirectoryIterator comes with PHP5 while glob is from PHP4. Both can do the job, it's up to you.

Promptbook answered 18/6, 2013 at 5:23 Comment(17)
ok but how can i use it to search for a specific file within folders/subfolders/subsubfolders and return the file's path ?Assiduous
rsearch: var_dump(rsearch('/folder/.../', '/.*zip/')); rglob: var_dump(rglob('/folder/*/test.zip')); it returns an array of matched files.Promptbook
can't get it to work... i tried with var_dump(rsearch('/', 'test.zip')); and also with var_dump(rsearch('$root', 'test.zip')); ... could you update your post with a code that works with my example in OP ? I want to search all folders and sub-folders for test.zipAssiduous
@WinstonSmith It does work. if you use rsearch, $pattern param is regular expression, which is why in my example there 2 slashes wrap around. You can use rglob, which accept wildcard parameter.Promptbook
tried var_dump(rsearch('$root', '/test.zip/')); doesnt work neither... your example search for all zip files, i want to search for a specific file (test.zip in the example, but it can also be somefile.rar or whatever.mp3)Assiduous
use: var_dump(rsearch($root, '/.*\/test.zip/')); DON"T use single quotes around $root.Promptbook
great ! got it to work now... but the script takes 30 seconds to execute itself :( Is it normal that it is so slow ? I actually have around 2000 files on the server. It is a dedicated server (atom single-core 1.2ghz with 2gb of memory)... The server is currently not open to public so there is no traffic and no server load, and it doesnt host any other sites... i guess i will need some sort of cachingAssiduous
ok i fixed the problem with the server, it now executes way faster. But i have another problem, after replacing the function call by rsearch($root, '/.*\/'.$search.'/') where $search is a GET value, if i look for a file with parenthesis (Example = test(test)test[test].zip) it wont return any result even if the file is on the server.Assiduous
[] and () are special characters in regular expression, see here: regular-expressions.info/reference.html A backslash escapes special characters to suppress their special meaning. special characters includes: [\^$.|?*+(){}, e.g. test\[test\]\(test\)Promptbook
I did some anecdotal tests on a deep directory structure and the rsearch function was an order of magnitude faster...Pretty
Not a right answer. for example rsearch('/folder/', '/.*mp3/') will also match a file named folder/mp3/album/file.mp3 , but returns 'folder/mp3' as a filename...Newcomer
I wish I could say the OO solution is cleaner, but it seems excessively verbose and more difficult to understand. What the heck is a RecursiveIteratorIterator?Labyrinthine
rglob - not the best function.... can't find files with given extension as it applies it to directories as well ... there is a better function on php.net - glob() comments. Called glob_recursive afair.Fellows
@JasonRDalton, retested, with PHP 7.1 ("anecdotally", too :) ), on a 60MB project tree (with two mid-size git worktrees and lots of other small files etc.), and got the exact opposite. Had a priming run for both right before the measurement, and I pretty consistently got numbers like: rglob: 0.02864, rsearch: 0.12413. Which is a lot more plausible, actually, than the other way around, I'd say.Babbitt
Thanks. I used the one with RecursiveDirectoryIterator. I needed to prepend \ to those class names. And I renamed $pattern to $regexPattern = '/.*/'.Wandis
I suggest you edit your answer to include rsearch($root, '/.*\/test.zip/')); to save newbies a lot of wasted time scanning through comments that are not visible by default. Apart from that, this is a nice answer.Euchre
Thank you. I use it with the pattern /.+\.[a-z]+/i to get all files with an ending. Sidenote: using array_merge inside a loop is resource greedy.Beget
O
43

I want to provide another simple alternative for cases where you can predict a max depth. You can use a pattern with braces listing all possible subfolder depths.

This example allows 0-3 arbitrary subfolders:

glob("$root/{,*/,*/*/,*/*/*/}test_*.zip", GLOB_BRACE);

Of course the braced pattern could be procedurally generated.

Otter answered 23/1, 2019 at 10:0 Comment(3)
Just be aware that GLOB_BRACE isn't available on all platforms. I only discovered that when my code failed in an automated pipeline.Modernity
Or for multiple file types(for example .pdf,.mp4 and .mp3): glob("$root/{.pdf,*/.pdf,/*/.pdf,/*/*/.pdf,.mp4,*/.mp4,/*/.mp4,/*/*/.mp4,.mp3,*/.mp3,/*/.mp3,/*/*/.mp3}", GLOB_BRACE)Conversable
@Conversable Multiple types would be expressed with a brace pattern as well: "$root/{,*/,*/*/,*/*/*/}test_*.{zip,gz,tgz}"Otter
D
11

This returns fullpath to the file

function rsearch($folder, $pattern) {
    $iti = new RecursiveDirectoryIterator($folder);
    foreach(new RecursiveIteratorIterator($iti) as $file){
         if(strpos($file , $pattern) !== false){
            return $file;
         }
    }
    return false;
}

call the function:

$filepath = rsearch('/home/directory/thisdir/', "/findthisfile.jpg");

And this is returns like:

/home/directory/thisdir/subdir/findthisfile.jpg

You can improve this function to find several files like all jpeg file:

function rsearch($folder, $pattern_array) {
    $return = array();
    $iti = new RecursiveDirectoryIterator($folder);
    foreach(new RecursiveIteratorIterator($iti) as $file){
        if (in_array(strtolower(array_pop(explode('.', $file))), $pattern_array)){
            $return[] = $file;
        }
    }
    return $return;
}

This can call as:

$filepaths = rsearch('/home/directory/thisdir/', array('jpeg', 'jpg') );

Ref: https://mcmap.net/q/338927/-recursive-file-search-php

Divorcement answered 28/4, 2016 at 10:48 Comment(3)
Probably should use $file->getExtension () rather than array_pop(explode('.', $file)) to avoid "PHP Notice: Only variables should be passed by reference in ...".Bithynia
@Divorcement Thanks for the function it's working well for my project. The only thing i would add is a die in case the folder path doesn't exist so it don't bother moving forward.-Miletus
You may want use yield instead of build a complete a $return array. This will produce a generator and improve performances a lot.Oz
B
9

As a full solution for your problem (this was also my problem):

function rsearch($folder, $pattern) {
    $dir = new RecursiveDirectoryIterator($folder);
    $ite = new RecursiveIteratorIterator($dir);
    $files = new RegexIterator($ite, $pattern, RegexIterator::MATCH);

    foreach($files as $file) {
         yield $file->getPathName();
    }
}

Will get you the full path of the items that you wish to find.

Edit: Thanks to Rousseau Alexandre for pointing out , $pattern must be regular expression.

Bey answered 23/1, 2019 at 10:36 Comment(3)
The pattern must be a regular expressionOz
Can you give a brief example of how to call it and iterate through the results?B
Example pattern for getting all files with .html extension: $pattern = '#^.*\.html$#'Urbanism

© 2022 - 2024 — McMap. All rights reserved.