How to normalize a path in Perl? (without checking the filesystem)
Asked Answered
S

6

8

I want the Perl's equivalent of Python's os.path.normpath():

Normalize a pathname by collapsing redundant separators and up-level references so that A//B, A/B/, A/./B and A/foo/../B all become A/B. This string manipulation may change the meaning of a path that contains symbolic links. […]

For instance, I want to convert '/a/../b/./c//d' into /b/c/d.

The path I'm manipulating does NOT represent a real directory in the local file tree. There are no symlinks involved. So a plain string manipulation works fine.

I tried Cwd::abs_path and File::Spec, but they don't do what I want.

my $path = '/a/../b/./c//d';

File::Spec->canonpath($path);
File::Spec->rel2abs($path, '/');
# Both return '/a/../b/c/d'.
# They don't remove '..' because it might change
# the meaning of the path in case of symlinks.

Cwd::abs_path($path);
# Returns undef.
# This checks for the path in the filesystem, which I don't want.

Cwd::fast_abs_path($path);
# Gives an error: No such file or directory

Possibly related link:

Speck answered 11/8, 2017 at 9:25 Comment(0)
S
4

Given that File::Spec is almost what I needed, I ended up writing a function that removes ../ from File::Spec->canonpath(). The full code including tests is available as a GitHub Gist.

use File::Spec;

sub path_normalize_by_string_manipulation {
    my $path = shift;

    # canonpath does string manipulation, but does not remove "..".
    my $ret = File::Spec->canonpath($path);

    # Let's remove ".." by using a regex.
    while ($ret =~ s{
        (^|/)              # Either the beginning of the string, or a slash, save as $1
        (                  # Followed by one of these:
            [^/]|          #  * Any one character (except slash, obviously)
            [^./][^/]|     #  * Two characters where
            [^/][^./]|     #    they are not ".."
            [^/][^/][^/]+  #  * Three or more characters
        )                  # Followed by:
        /\.\./             # "/", followed by "../"
        }{$1}x
    ) {
        # Repeat this substitution until not possible anymore.
    }

    # Re-adding the trailing slash, if needed.
    if ($path =~ m!/$! && $ret !~ m!/$!) {
        $ret .= '/';
    }

    return $ret;
}
Speck answered 11/8, 2017 at 12:51 Comment(0)
L
2

My use case was normalizing include paths inside files relative to another path. For example, I might have a file at '/home/me/dita-ot/plugins/org.oasis-open.dita.v1_3/rng/technicalContent/rng/concept.rng' that includes the following file relative to itself:

<include href="../../base/rng/topicMod.rng"/>

and I needed the absolute path of that included file. (The including file path might be absolute or relative.)

Path::Tiny was promising, but I can only use core modules.

I tried using chdir to the include file location then using File::Spec->rel2abs() to resolve the path, but that was painfully slow on my system.

I ended up writing a subroutine to implement a simple string-based method of evaporating '../' components:

#!/usr/bin/perl
use strict;
use warnings;

use Cwd;
use File::Basename;
use File::Spec;

sub adjust_local_path {
 my ($file, $relative_to) = @_;
 return Cwd::realpath($file) if (($relative_to eq '.') || ($file =~ m!^\/!));  # handle the fast cases

 $relative_to = dirname($relative_to) if (-f $relative_to);
 $relative_to = Cwd::realpath($relative_to);
 while ($file =~ s!^\.\./!!) { $relative_to =~ s!/[^/]+$!!; }
 return File::Spec->catdir($relative_to, $file);
}

my $included_file = '/home/chrispy/dita-ot/plugins/org.oasis-open.dita.v1_3/rng/technicalContent/rng/topic.rng';
my $source_file = '.././base/rng/topicMod.rng';
print adjust_local_path($included_file, $source_file)."\n";

The result of the script above is

$ ./test.pl
/home/me/dita-ot-3.1.3/plugins/org.oasis-open.dita.v1_3/rng/technicalContent/base/rng/topicMod.rng

Using realpath() had the nice side-effect of resolving symlinks, which I needed. In the example above, dita-ot/ is a link to dita-ot-3.1.3/.

You can provide either a file or a path as the second argument; if it's a file, the directory path of that file is used. (This was convenient for my own purposes.)

Longlongan answered 7/1, 2019 at 22:15 Comment(1)
This looks like an interesting solution for a different problem. This solution relies heavily on the filesystem, while my initial problem needed a pure string manipulation (as there were no files involved).Biolysis
F
2

Fixing Tom van der Woerdt code:

foreach my $path ("/a/b/c/d/../../../e" , "/a/../b/./c//d") {
    my $absolute = $path =~ m!^/!;
    my @c= reverse split m@/@, $path;
    my @c_new;
    while (@c) {
        my $component= shift @c;
        next unless length($component);
        if ($component eq ".") { next; }
        if ($component eq "..") { 
            my $i=0;
            while ($c[$i] && $c[$i] =~ m/^\.{1,2}$/) {
                $i++
            }
            if ($i > $#c) {
                push @c_new, $component unless $absolute;
            } else {
                splice(@c, $i, 1);
            }
            next 
        }
        push @c_new, $component;
    }
    print "/".join("/", reverse @c_new) ."\n";
}
Fonteyn answered 23/11, 2021 at 13:37 Comment(0)
D
1

You mentioned that you tried File::Spec and it didn't do what you want. That's because you were probably using it on a Unix-like system, where if you try to cd to something like path/to/file.txt/.. it will fail unless path/to/file.txt is a legitimate directory path.

However, the command cd path/to/file.txt/.. will work on a Win32 system, provided that path/to is a real directory path -- regardless of whether file.txt is a real subdirectory.

In case you don't see where I'm going yet, it's that the File::Spec module won't do what you want (unless you're on a Win32 system), but the module File::Spec::Win32 will do what you want. And what's cool is, File::Spec::Win32 should be available as a standard module even on non-Win32 platforms!

This code pretty much does what you want:

use strict;
use warnings;
use feature 'say';

use File::Spec::Win32;

my $path = '/a/../b/./c//d';
my $canonpath = File::Spec::Win32->canonpath($path);
say $canonpath;   # This prints:  \b\c\d

Unfortunately, since we're using the Win32 flavor of File::Spec, the \ is used as the directory separator (instead of the Unix /). It should be trivial for you to convert those \ to /, provided that the original $path does not contain any \ to begin with.

And if your original $path does contain legitimate \ characters, it shouldn't be too difficult to figure out a way to preserve them (so that they don't get converted to /). Although I have to say that if your paths actually contain \ characters, they have probably caused quite a bit of headaches so far.

And since Unix-like systems (including Win32) supposedly don't allow for null characters in their pathnames, one solution to preserving the \ characters in your pathnames is to first convert them to null bytes, then call File::Spec::Win32->canonpath( ... );, and then convert the null bytes back to the \ characters. This can be done very straight-forward, with no looping:

use File::Spec::Win32;

my $path = '/a/../b/./c//d';
$path =~ s[\\][\0]g;   # Converts backslashes to null bytes.
$path = File::Spec::Win32->canonpath($path);
$path =~ s[\\][/]g;   # Converts \ to / characters.
$path =~ s[\0][\\]g;   # Converts null bytes back to backslashes.
# $path is now set to:  /b/c/d
Determined answered 27/6, 2023 at 21:28 Comment(1)
Who would have thought that the Win32 module solves a non-windows problem‽ That's surprising, and I'd even say it's obscure. But it works. I've added your solution to my tests, and while initially it failed half of the tests, the fix was easy (just preserve the trailing slash, if needed). After that small fix, it passed all tests! And it worked (on Linux) even though no such directories exist in the file system.Biolysis
H
0

Removing '.' and '..' from paths is pretty straight-forward if you process the path right-to-left :

my $path= "/a/../b/./c//d";
my @c= reverse split m@/@, $path;
my @c_new;
while (@c) {
    my $component= shift @c;
    next unless length($component);
    if ($component eq ".") { next; }
    if ($component eq "..") { shift @c; next }
    push @c_new, $component;
}
say "/".join("/", reverse @c_new);

(Assumes the path starts with a /)

Note that this violates the UNIX pathname resolution standards, specifically this part :

A pathname that begins with two successive slashes may be interpreted in an implementation-defined manner, although more than two leading slashes shall be treated as a single slash.

Highpriced answered 11/8, 2017 at 9:37 Comment(2)
I think this will fail when you have a run of .., e.g. /a/b/c/d/../../../e.Acetabulum
@TimAngus you are right, well spotted! posted a fixed version :)Fonteyn
F
0

The Path::Tiny module does exactly this:

use strict;
use warnings;
use 5.010;

use Path::Tiny;
say path('/a/../b/./c//d');

Output:

/b/c/d
Fulmar answered 11/8, 2017 at 17:35 Comment(1)
Not for me. Path::Tiny seems to behave exactly like File::Spec: /a/../b/c/dBiolysis

© 2022 - 2024 — McMap. All rights reserved.