Why doesn't Perl file glob() work outside of a loop in scalar context?
Asked Answered
T

4

10

According to the Perl documentation on file globbing, the <*> operator or glob() function, when used in a scalar context, should iterate through the list of files matching the specified pattern, returning the next file name each time it is called or undef when there are no more files.

But, the iterating process only seems to work from within a loop. If it isn't in a loop, then it seems to start over immediately before all values have been read.

From the Perl docs:

In scalar context, glob iterates through such filename expansions, returning undef when the list is exhausted.

http://perldoc.perl.org/functions/glob.html

However, in scalar context the operator returns the next value each time it's called, or undef when the list has run out.

http://perldoc.perl.org/perlop.html#I/O-Operators

Example code:

use warnings;
use strict;

my $filename;

# in scalar context, <*> should return the next file name
# each time it is called or undef when the list has run out

$filename = <*>;
print "$filename\n";
$filename = <*>;      # doesn't work as documented, starts over and
print "$filename\n";  # always returns the same file name
$filename = <*>;
print "$filename\n";

print "\n";

print "$filename\n" while $filename = <*>; # works in a loop, returns next file
                                           # each time it is called

In a directory with 3 files...file1.txt, file2.txt, and file3.txt, the above code will output:

file1.txt
file1.txt
file1.txt

file1.txt
file2.txt
file3.txt

Note: The actual perl script should be outside the test directory, or you will see the file name of the script in the output as well.

Am I doing something wrong here, or is this how it is supposed to work?

Twobit answered 13/4, 2010 at 21:44 Comment(0)
T
9

Here's a way to capture the magic of the <> glob operator's state into an object that you can manipulate in a normal sort of way: anonymous subs (and/or closures)!

sub all_files {
    return sub { scalar <*> };
}

my $iter = all_files();
print $iter->(), "\n";
print $iter->(), "\n";
print $iter->(), "\n";

or perhaps:

sub dir_iterator {
    my $dir = shift;
    return sub { scalar glob("$dir/*") };
}
my $iter = dir_iterator("/etc");
print $iter->(), "\n";
print $iter->(), "\n";
print $iter->(), "\n";

Then again my inclination is to file this under "curiosity". Ignore this particular oddity of glob() / <> and use opendir/readdir, IO::All/readdir, or File::Glob instead :)

Thrall answered 13/4, 2010 at 23:57 Comment(1)
Interesting method and good workaround for capturing the state of the operator. I was wondering how/if that could be done.Twobit
T
5

The following code also seems to create 2 separate instances of the iterator...

for ( 1..3 )
{
   $filename = <*>;
   print "$filename\n" if defined $filename;
   $filename = <*>;
   print "$filename\n" if defined $filename;
}

I guess I see the logic there, but it is kind of counter intuitive and contradictory to the documentation. The docs don't mention anything about having to be in a loop for the iteration to work.

Twobit answered 13/4, 2010 at 22:30 Comment(1)
+1 Great experiment. This is similar to how the range operator (..) behaves, where each use of the operator maintains its own state. Heck if I can find that documented anywhere, though.Berty
B
3

Also from perlop:

A (file)glob evaluates its (embedded) argument only when it is starting a new list.

Calling glob creates a list, which is either returned whole (in list context) or retrieved one element at a time (in scalar context). But each call to glob creates a separate list.

Berty answered 13/4, 2010 at 22:22 Comment(0)
B
0

(Scratching away at my rusty memory of Perl...) I think that multiple lexical instances of <*> are treated as independent invokations of glob, whereas in the while loop you are invoking the same "instance" (whatever that means).

Imagine, for instance, if you did this:

while (<*>) { ... }
...
while (<*>) { ... }

You certainly wouldn't expect those two invocations to interfere with each other.

Beret answered 13/4, 2010 at 21:58 Comment(6)
They wouldn't interfere because the first invocation would reset after returning undef, according to the documentation.Twobit
I would expect separate instances in different scopes, but in the same scope I would expect to invoke the same "instance".Twobit
What if there's a conditional `break' in the middle of the first one? At the end of the day, what the Perl interpreter actually does is the "truth".Beret
Outside of a loop, is there a way to call the first "instance" again?Twobit
Sorry @Rob, you're outside my sphere of knowledge now.Beret
@Rob, no, it's a thing in the optree, not an "object" in the normal perl sense. There's no practical way to get at it. I think I know a way to reify it though -- answer forthcoming :)Thrall

© 2022 - 2024 — McMap. All rights reserved.