What reasons are there to prefer glob over readdir (or vice-versa) in Perl?
Asked Answered
V

10

64

This question is a spin-off from this one. Some history: when I first learned Perl, I pretty much always used glob rather than opendir + readdir because I found it easier. Then later various posts and readings suggested that glob was bad, and so now I pretty much always use readdir.

After thinking over this recent question I realized that my reasons for one or the other choice may be bunk. So, I'm going to lay out some pros and cons, and I'm hoping that more experienced Perl folks can chime in and clarify. The question in a nutshell is are there compelling reasons to prefer glob to readdir or readdir to glob (in some or all cases)?

glob pros:

  1. No dotfiles (unless you ask for them)

  2. Order of items is guaranteed

  3. No need to prepend the directory name onto items manually

  4. Better name (c'mon - glob versus readdir is no contest if we're judging by names alone)

  5. (From ysth's answer; cf. glob cons 4 below) Can return non-existent filenames:

    @deck = glob "{A,K,Q,J,10,9,8,7,6,5,4,3,2}{\x{2660},\x{2665},\x{2666},\x{2663}}";
    

glob cons:

  1. Older versions are just plain broken (but 'older' means pre 5.6, I think, and frankly if you're using pre 5.6 Perl, you have bigger problems)

  2. Calls stat each time (i.e., useless use of stat in most cases)

  3. Problems with spaces in directory names (is this still true?)

  4. (From brian's answer) Can return filenames that don't exist:

    $ perl -le 'print glob "{ab}{cd}"'
    

readdir pros:

  1. (From brian's answer) opendir returns a filehandle which you can pass around in your program (and reuse), but glob simply returns a list
  2. (From brian's answer) readdir is a proper iterator and provides functions to rewinddir, seekdir, telldir
  3. Faster? (Pure guess based on some of glob's features from above. I'm not really worried about this level of optimization anyhow, but it's a theoretical pro.)
  4. Less prone to edge-case bugs than glob?
  5. Reads everything (dotfiles too) by default (this is also a con)
  6. May convince you not to name a file 0 (a con also - see Brad's answer)
  7. Anyone? Bueller? Bueller?

readdir cons:

  1. If you don't remember to prepend the directory name, you will get bit when you try to do filetests or copy items or edit items or...
  2. If you don't remember to grep out the . and .. items, you will get bit when you count items, or try to walk recursively down the file tree or...
  3. Did I mention prepending the directory name? (A sidenote, but my very first post to the Perl Beginners mail list was the classic, "Why does this code involving filetests not work some of the time?" problem related to this gotcha. Apparently, I'm still bitter.)
  4. Items are returned in no particular order. This means you will often have to remember to sort them in some manner. (This could be a pro if it means more speed, and if it means that you actually think about how and if you need to sort items.) Edit: Horrifically small sample, but on a Mac readdir returns items in alphabetical order, case insensitive. On a Debian box and an OpenBSD server, the order is utterly random. I tested the Mac with Apple's built-in Perl (5.8.8) and my own compiled 5.10.1. The Debian box is 5.10.0, as is the OpenBSD machine. I wonder if this is a filesystem issue, rather than Perl?
  5. Reads everything (dotfiles too) by default (this is also a pro)
  6. Doesn't necessarily deal well with a file named 0 (see pros also - see Brad's answer)
Vennieveno answered 1/10, 2009 at 22:15 Comment(1)
On my Mac with Perl 5.10.1, I was able to create a directory with a space in it and glob returned it as part of its list. I even made a directory name with a newline in it and it worked. :)Boiney
B
43

You missed the most important, biggest difference between them: glob gives you back a list, but opendir gives you a directory handle. You can pass that directory handle around to let other objects or subroutines use it. With the directory handle, the subroutine or object doesn't have to know anything about where it came from, who else is using it, and so on:

 sub use_any_dir_handle {
      my( $dh ) = @_;
      rewinddir $dh;
      ...do some filtering...
      return \@files;
      }

With the dirhandle, you have a controllable iterator where you can move around with seekdir, although with glob you just get the next item.

As with anything though, the costs and benefits only make sense when applied to a certain context. They do not exist outside of a particular use. You have an excellent list of their differences, but I wouldn't classify those differences without knowing what you were trying to do with them.

Some other things to remember:

  • You can implement your own glob with opendir, but not the other way around.

  • glob uses its own wildcard syntax, and that's all you get.

  • glob can return filenames that don't exist:

    $ perl -le 'print glob "{ab}{cd}"'
    
Boiney answered 1/10, 2009 at 22:39 Comment(6)
Thanks for helping me see the point about filehandle versus list.Vennieveno
List versus dir handle can be significant if the directory is large (has a lot of files) or if it's changing (files getting created and deleted) while the program runs.Grandniece
+1 to @Loadmaster -- about the only time I notice glob v readdir is when I have a lot of files (>10,000) in the dir.Codie
There is a scalar context for glob too. You'd have the same problem if you used readdir to make the big list all at once.Boiney
Can't you overload glob() and give it a different wildcard syntax (demonstrated in perldoc somewhere)?Eliseoelish
Sure you can overload glob, but why would you? If you don't want to use it, just make your own subroutine to do exactly what you need while leaving the built-in alone.Boiney
C
7

glob pros: Can return 'filenames' that don't exist:

my @deck = List::Util::shuffle glob "{A,K,Q,J,10,9,8,7,6,5,4,3,2}{\x{2660},\x{2665},\x{2666},\x{2663}}";
while (my @hand = splice @deck,0,13) {
    say join ",", @hand;
}
__END__
6♥,8♠,7♠,Q♠,K♣,Q♦,A♣,3♦,6♦,5♥,10♣,Q♣,2♠
2♥,2♣,K♥,A♥,8♦,6♠,8♣,10♠,10♥,5♣,3♥,Q♥,K♦
5♠,5♦,J♣,J♥,J♦,9♠,2♦,8♥,9♣,4♥,10♦,6♣,3♠
3♣,A♦,K♠,4♦,7♣,4♣,A♠,4♠,7♥,J♠,9♥,7♦,9♦
Citrus answered 2/10, 2009 at 2:33 Comment(3)
I see that as a con overall. This is a very clever way to create a deck of cards, but I'm having a hard time thinking of more serious cases where I would want pseudo-filenames (but I can imagine lots of cases where I wouldn't want a filename returned that didn't exist).Vennieveno
@Telemachus: it is both a pro and a con. Any time you use the {} syntax and want only existing files, you need to filter the results, but that's well worth it IMO. (I know I do things like cp -a dirname{,.orig} regularly.)Citrus
It can be useful sometimes when you want to create the filenames. The typical use case is the creation of a directory hierarchy. For example, in the shell you might do mkdir -p /home/{alice,bob,charlie}/{public_html,mail,docs}Mol
M
6

glob makes it convenient to read all the subdirectories of a given fixed depth, as in glob "*/*/*". I've found this handy in several occasions.

Mol answered 1/3, 2010 at 23:30 Comment(0)
A
6

Here is a disadvantage for opendir and readdir.

{
  open my $file, '>', 0;
  print {$file} 'Breaks while( readdir ){ ... }'
}
opendir my $dir, '.';

my $a = 0;
++$a for readdir $dir;
print $a, "\n";

rewinddir $dir;

my $b = 0;
++$b while readdir $dir;
print $b, "\n";

You would expect that code would print the same number twice, but it doesn't because there is a file with the name of 0. On my computer it prints 251, and 188, tested with Perl v5.10.0 and v5.10.1

This problem also makes it so that this just prints out a bunch of empty lines, regardless of the existence of file 0:

use 5.10.0;
opendir my $dir, '.';

say while readdir $dir;

Where as this always works just fine:

use 5.10.0;
my $a = 0;
++$a for glob '*';
say $a;

my $b = 0;
++$b while glob '*';
say $b;

say for glob '*';
say while glob '*';

I fixed these issues, and sent in a patch which made it into Perl v5.11.2, so this will work properly with Perl v5.12.0 when it comes out.

My fix converts this:

while( readdir $dir ){ ... }

into this:

while( defined( $_ = readdir $dir ){ ...}

Which makes it work the same way that read has worked on files. Actually it is the same bit of code, I just added another element to the corresponding if statements.

Adagio answered 2/3, 2010 at 17:46 Comment(2)
It's a good edge case to be aware of, but I'm not sure how much sympathy I have for people who create filenames like 0. Once you go down that road, it's a short step to a file named ` did you see the two spaces at the start of this file?.txt. Still, I'll note it in the readdir` cons.Vennieveno
The same fix is going into 5.18 for each.Adagio
O
3

Well, you pretty much cover it. All that taken into account, I would tend to use glob when I'm throwing together a quick one-off script and its behavior is just what I want, and use opendir and readdir in ongoing production code or libraries where I can take my time and clearer, cleaner code is helpful.

Ours answered 1/10, 2009 at 22:29 Comment(0)
T
2

On a similar note, File::Slurp has a function called read_dir.

Since I use File::Slurp's other functions a lot in my scripts, read_dir has also become a habit.

It also has following options: err_mode, prefix, and keep_dot_dot.

Truthfunction answered 1/10, 2009 at 22:15 Comment(0)
S
2

That was a pretty comprehensive list. readdir (and readdir + grep) has less overhead than glob and so that is a plus for readdir if you need to analyze lots and lots of directories.

Shaddock answered 1/10, 2009 at 22:28 Comment(0)
A
2

For small, simple things, I prefer glob. Just the other day, I used it and a twenty line perl script to retag a large portion of my music library. glob, however, has a pretty strange name. Glob? It's not intuitive at all, as far as a name goes.

My biggest hangup with readdir is that it treats a directory in a way that's somewhat odd to most people. Usually, programmers don't think of a directory as a stream, they think of it as a resource, or list, which glob provides. The name is better, the functionality is better, but the interface still leaves something to be desired.

Azure answered 1/10, 2009 at 22:51 Comment(2)
Tastes vary, but I find readdir a bit stuffy (as a name) and glob just about right. Then again, I love Ruby's splat operator (as a name and otherwise) so I guess I'm odd.Vennieveno
I definitely agree that glob is a great name for it...I just wish it was more intuitive :)Azure
P
2

glob pros:

3) No need to prepend the directory name onto items manually

Exception:

say for glob "*";

--output:--
1perl.pl
2perl.pl
2perl.pl.bak
3perl.pl
3perl.pl.bak
4perl.pl
data.txt
data1.txt
data2.txt
data2.txt.out

As far as I can tell, the rule for glob is: you must provide a full path to the directory to get full paths back. The Perl docs do not seem to mention that, and neither do any of the posts here.

That means that glob can be used in place of readdir when you want just filenames (rather than full paths), and you don't want hidden files returned, i.e. ones starting with '.'. For example,

chdir ("../..");  
say for glob("*");
Personalty answered 14/11, 2009 at 6:44 Comment(0)
O
0

First, do some reading. Chapter 9.6. of the Perl Cookbook outlines the point I want to get to nicely, just under the discussion heading.

Secondly, do a search for glob and dosglob in your Perl directory. While many different sources (ways to get the file list) can be used, the reason why I point you to dosglob is that if you happen to be on a Windows platform (and using the dosglob solution), it is actually using opendir/readdir/closedir. Other versions use built-in shell commands or precompiled OS specific executables.

If you know you are targetting a specific platform, you can use this information to your advantage. Just for reference I looked into this on Strawberry Perl Portable edition 5.12.2, so things may be slightly different on newer or original versions of Perl.

Orrin answered 1/10, 2009 at 22:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.