Perl - A way to get only the first (.txt) filename from another directory without loading them all?
Asked Answered
C

3

0

I have a directory that holds ~5000 2,400 sized .txt files.

I just want one filename from that directory; order does not matter.

The file will be processed and deleted.

This is not the scripts working directory.

The intention is:

  • to open that file,
  • read it,
  • do some stuff,
  • unlink it and then
  • loop to the next file.

My crude attempt does not check for only .txt files and also has to get all ~5000 filenames just for one filename. I am also possibly calling too many modules?

The Verify_Empty sub was intended to validate that there is a directory and there are files in it but, my attempts are failing so, here I am seeking assistance.

#!/usr/bin/perl -w
use strict;
use warnings;
use CGI;
use CGI ':standard';
print CGI::header();
use CGI::Carp qw(fatalsToBrowser warningsToBrowser);
###
use vars qw(@Files $TheFile $PathToFile);
my $ListFolder = CGI::param('openthisfolder');
Get_File($ListFolder);
###
sub Get_File{
  $ListFolder = shift;
  unless (Verify_Empty($ListFolder)) {
    opendir(DIR,$ListFolder);
    @Files = grep { $_ ne '.' && $_ ne '..' } readdir(DIR);
    closedir(DIR);
    foreach(@Files){
      $TheFile = $_;
    }
    #### This is where I go off to process and unlink file (sub not here) ####
    $PathToFile = $ListFolder.'/'.$TheFile;
    OpenFileReadPrepare($PathToFile); 
    #### After unlinked, the OpenFileReadPrepare sub loops back to this script. 
  }
  else {
    print qq~No more files to process~;
    exit;
  }
  exit;
}
    ####
sub Verify_Empty {
  $ListFolder = shift;
  opendir(DIR, $ListFolder) or die "Not a directory";
  return scalar(grep { $_ ne "." && $_ ne ".." } readdir(DIR)) == 0;
  closedir(DIR);
}

Obviously I am very new at this. This method seems quite "hungry"? Seems like a lot to grab one filename and process it! Guidance would be great!

EDIT -Latest Attempt

my $dir = '..';
my @files = glob "$dir/*.txt";
for (0..$#files){
$files[$_] =~ s/\.txt$//;
}
my $PathAndFile =$files[0].'.txt';
print qq~$PathAndFile~;

This "works" but, it still gets all the filenames. None of the examples here, so far, have worked for me. I guess I will live with this for today until I figure it out. Perhaps I will revisit and see if anyone came up with anything better.

Cedar answered 8/5, 2013 at 14:38 Comment(7)
Define "first"? It sounds very strange to say that order does not matter when also mentioning first and last. Is this just a very complicated way of saying that you want to iterate over all the files?Directional
Rather than use vars ( ... $TheFile .. ) and foreach(@Files){ $TheFile = $_; I prefer to write foreach my $TheFile (@Files){. Has the advantage of giving the variable the smallest scope and not using $_.Garlen
The parameters to calls of Get_File and Verify_Empty are not needed as $ListFolder is in scope within both; their first assignments just overwrite the variable with the value it already contains.Garlen
First and last do not matter. I DO NOT want to iterate over all the files. Just want one file and it does not matter which one.Cedar
But, @Student33, the bulleted list in your question states "[then] loop to the next file". What does that phrase mean if not that all files are to be processed?Garlen
That is handled by another sub. I was only wanting to get one filename from directory without loading them all. Explaining the entire scenario sometimes helps. Also, my question was edited a bit so, looks like those details were deleted.Cedar
@Student33 You will find that explaining ALL of your "scenario" rather than part of it helps more than a little. Your current attempt which first gets all the files ending with .txt, then deletes all the .txt endings, then adds .txt ending again... is rather.... weird. Just... my $file = glob "$dir/*.txt". Don't put them in an array, change the array, take the first element of the array and change it back. That's just dumb.Directional
E
4

You could loop using readdir inside while loop. In that way readdir won't return all files but give only one at the time,

# opendir(DIR, ...);
my $first_file = "";
while (my $file = readdir(DIR)) {

  next if $file eq "." or $file eq "..";
  $first_file = $file;
  last;
}
print "$first_file\n"; # first file in directory
Estaestablish answered 8/5, 2013 at 15:9 Comment(3)
I could not get this to return any filenames. The files are not in the same directory as the script but, one dir down.Cedar
@Student33, you're doing something wrong. This code snippet does not specify which directory opened. Moreover, it is essentially correct.Renovate
You'll need to opendir first, ie. my $ListFolder = CGI::param('openthisfolder'); opendir(DIR, $ListFolder);Estaestablish
E
4

You're calling readdir in list context, which returns all of the directory entries. Call it in scalar context instead:

my $file;
while( my $entry = readdir DIR ) {

    $file = $entry, last if $entry =~ /\.txt$/;        
}

if ( defined $file ) {
    print "found $file\n";
    # process....
}

Additionally, you read the directory twice; once to see if it has any entries, then to process it. You don't really need to see if the directory is empty; you get that for free during the processing loop.

Eirene answered 8/5, 2013 at 16:49 Comment(3)
I could not get this to return any filenames. The files are not in the same directory as the script but, one dir down. I tried to modify to no avail.Cedar
@Student33, this code snippet, too, does not specify which directory to open, and is materially correct. You're doing something wrong in adapting it.Renovate
The tests for '.' and '..' equality are not needed, since you are already checking /\.txt/ anyway.Renovate
D
2

Unless I am greatly mistaken, what you want is just to iterate over the files in a directory, and all this about "first or last" and "order does not matter" and deleting files is just confusion about how to do this.

So, let me put it in a very simple way for you, and see if that actually does what you want:

my $directory = "somedir";
for my $file (<$directory/*.txt>) {
    # do stuff with the files
}

The glob will do the same as a *nix shell would, it would list the files with the .txt extension. If you want to do further tests on the files inside the loop, that is perfectly fine.

The downside is keeping 5000 file names in memory, and also that if processing this file list takes time, there is a possibility that it conflicts with other processes that also access these files.

An alternative is to simply read the files with readdir in a while loop, such as mpapec mentioned in his answer. The benefit is that each time you read a new file name, the file will be there. Also, you won't have to keep a large list of file in memory.

Directional answered 8/5, 2013 at 15:9 Comment(4)
No, I do not want to iterate over all files. That is what I am trying to avoid. I just want one filename.txt and that is all.Cedar
Well, do you want a random file... Or? First means that the file names are sorted, but you say orderr does not matter.Directional
Correct. Any filename. By whichever means that uses least resources.Cedar
Well, why don't you just do my $file = <$dir/*.txt> then? It seems to reliably pick a file off the top of the list.Directional

© 2022 - 2024 — McMap. All rights reserved.