Perl's 'readdir' Function Result Order?
Asked Answered
S

2

5

I am running Perl in Windows and I am getting a list of all the files in a directory using readdir and storing the result in an array. The first two elements in the array seem to always be "." and "..". Is this order guaranteed (assuming the operating system does not change)?

I would like to do the following to remove these values:

my $directory = 'C:\\foo\\bar';

opendir my $directory_handle, $directory 
    or die "Could not open '$directory' for reading: $!\n";

my @files = readdir $directory_handle;
splice ( @files, 0, 2 ); # Remove the "." and ".." elements from the array

But I am worried that it might not be safe to do so. All the solutions I have seen use regular expressions or if statements for each element in the array and I would rather not use either of those approaches if I don't have to. Thoughts?

Scut answered 10/5, 2016 at 20:31 Comment(12)
"All the solutions I have seen use regular expressions or if statements for each element in the array and I would rather not use either of those approaches" Why not?Septal
You should take it as significant that all the solutions you've found use that approach. A regex that matches leading '.' also allows you to ignore files whose names start with '.', which is the standard way of hiding files in the `nix world.Toreador
@RobK, I did. That's why I was asking the question in the first place. :)Scut
@ThisSuitIsBlackNot, I try to avoid regexs whenever possible because they are confusing to read for most people (at least most people I work with, myself included). And if for each iteration seemed like more overhead and more code than was necessary, but apparently, it is necessary.Scut
They don't appear to even exist in the root directory in Windows.Toreador
@RobK, Good catch.Scut
You don't have to use a regex: my @files = grep { $_ ne '.' && $_ ne '..' } readdir $dh; (although any Perl developer should be able to understand simple regexes).Septal
@RobK: Which root directory? Isn't there one for each mounted device? (` C:\ , D:\ `, ...). (Pardon the backticks, I can't make that come out the way I want it.)Fleeting
A related question about the C/POSIX readdir() function: https://mcmap.net/q/55147/-does-readdir-guarantee-an-order/827263Fleeting
@KeithThompson C: definitely. From my brief testing, it appears actual disks don't have them, network shares mounted as drives do.Toreador
@RobK: Sure. My point is that C: is not necessarily the only mounted device. Mounting a second drive as D: is not uncommon -- and remember that it's called C: because A: and B: were commonly floppy drives. Windows, unlike Unix, has no single root directory.Fleeting
Rob K mean "a root directory", not "the root directory". No need to blow this out of proportion. // That said, he's wrong. A root directory can have . and .. in Windows. (e.g. subst z: c:\users & dir z:\ & subst /d z:)Linette
R
10

There is no guarantee on the order of readdir. The docs state it...

Returns the next directory entry for a directory opened by opendir.

The whole thing is stepping through entries in the directory in whatever order they're provided by the filesystem. There is no guarantee what this order may be.

The usual way to work around this is with a regex or string equality.

my @dirs = grep { !/^\.{1,2}\z/ } readdir $dh;

my @dirs = grep { $_ ne '.' && $_ ne '..' } readdir $dh;

Because this is such a common issue, I'd recommend using Path::Tiny->children instead of rolling your own. They'll have figured out the fastest and safest way to do it, which is to use grep to filter out . and ... Path::Tiny fixes a lot of things about Perl file and directory handling.

Roentgenogram answered 10/5, 2016 at 20:38 Comment(6)
I'll look into Path::Tiny->children. Thanks!Scut
If you're going to use a regex, an alternative that I find slightly more readable is /^\.\.?$/.Fleeting
you apparently don't have any files named ".\n" or "..\n" :) $ is only really useful for matching user or file input; anywhere else, avoid itBusty
@Busty Although I've never seen such a thing in the wild, you are technically correct, the best kind of correct! Reason #1928 to not roll your own and use Path::Tiny.Roentgenogram
@Schwern, I checked out Path::Tiny and it looks perfect for what I'm doing, but doesn't seem to be available in ActivePerl. So, I'm probably going to go with your grep solution.Scut
@HansGoldman It will work on Windows fine. By "not available for ActivePerl" I assume you mean there's not a PPM for it? PPM repositories are often incomplete and out of date... but here it is and it's even up to date. If it wasn't there, Path::Tiny is a pure Perl module and can be installed on ActivePerl. Also consider switching to Strawberry Perl which comes with its own properly configured compiler and CPAN suite and is compatible with more of CPAN.Roentgenogram
C
10

This perlmonks thread from 2001 investigated this very issue, and Perl wizard Randal Schwartz concluded

readdir on Unix returns the underlying raw directory order. Additions and deletions to the directory use and free-up slots. The first two entries to any directory are always created as "dot" and "dotdot", and these entries are never deleted under normal operation.

However, if a directory entry for either of these gets incorrectly deleted (through corruption, or using the perl -U option and letting the superuser unlink it, for example), the next fsck run has to recreate the entry, and it will simply add it. Oops, dot and dotdot are no longer the first two entries!

So, defensive programming mandates that you do not count on the slot order. And there's no promise that dot and dotdot are the first two entries, because Perl can't control that, and the underlying OS doesn't promise it either.

Conjectural answered 10/5, 2016 at 20:41 Comment(1)
I was afraid of that.Scut

© 2022 - 2024 — McMap. All rights reserved.