std::fs::canonicalize for files that don't exist
Asked Answered
A

2

9

I'm writing a program in Rust that creates a file at a user-defined path. I need to be able to normalize intermediate components (~/ should become $HOME/, ../ should go up a directory, etc.) in order to create the file in the right place. std::fs::canonicalize does almost exactly what I want, but it panics if the path does not already exist.

Is there a function that normalizes componenets the same way as std::fs::canonicalize but doesn't panic if the file doesn't already exist?

Aggregation answered 2/7, 2021 at 21:14 Comment(6)
Small correction: it does not panic, but return an Err if the file does not exist. You are likely calling unwrap() or expect() on it. Regardless: good question. I want such a thing as well!Journalize
@LukasKalbertodt: what would you have it do instead of return an error? What is the “canonical” path of something nonexistent? What does that even mean?!Gnomon
@Gnomon Of course, that's the difficulty. I would expect an answer to this question to explain these problems with the requirement. I still think there is something one can do, even if the file does not exist. For example, one could canonicalize the closest existing predecessor of the requested file and then just verbatim add the remaining path. I would agree this shouldn't be the default behavior of std::fs::canonicalize, but I think it is a useful behavior to have.Journalize
@Gnomon the behavior I'm looking for is normalizing intermediate components the same way std::fs::canonicalize does.Aggregation
~/ and $HOME are feature from shell, not a linux feature.Gauldin
github.com/Canop/broot/blob/master/src/path/normalize.rs#L13Gauldin
D
6

There are good reasons such a function isn't standard:

  1. there's no unique path when you're dealing with both links and non existing files. If a/b is a link to c/d/e, then a/b/../f could either mean a/f or c/d/f

  2. the ~ shortcut is a shell feature. You may want to generalize it (I do), but that's a non obvious choice, especially when you consider ~ is a valid file name in most systems.

This being said, it's sometimes useful, in cases those ambiguities aren't a problem because of the nature of your application.

Here's what I do in such a case:

use {
    directories::UserDirs,
    lazy_regex::*,
    std::path::{Path, PathBuf},
};

/// build a usable path from a user input which may be absolute
/// (if it starts with / or ~) or relative to the supplied base_dir.
/// (we might want to try detect windows drives in the future, too)
pub fn path_from<P: AsRef<Path>>(
    base_dir: P,
    input: &str,
) -> PathBuf {
    let tilde = regex!(r"^~(/|$)");
    if input.starts_with('/') {
        // if the input starts with a `/`, we use it as is
        input.into()
    } else if tilde.is_match(input) {
        // if the input starts with `~` as first token, we replace
        // this `~` with the user home directory
        PathBuf::from(
            &*tilde
                .replace(input, |c: &Captures| {
                    if let Some(user_dirs) = UserDirs::new() {
                        format!(
                            "{}{}",
                            user_dirs.home_dir().to_string_lossy(),
                            &c[1],
                        )
                    } else {
                        warn!("no user dirs found, no expansion of ~");
                        c[0].to_string()
                    }
                })
        )
    } else {
        // we put the input behind the source (the selected directory
        // or its parent) and we normalize so that the user can type
        // paths with `../`
        normalize_path(base_dir.join(input))
    }
}


/// Improve the path to try remove and solve .. token.
///
/// This assumes that `a/b/../c` is `a/c` which might be different from
/// what the OS would have chosen when b is a link. This is OK
/// for broot verb arguments but can't be generally used elsewhere
///
/// This function ensures a given path ending with '/' still
/// ends with '/' after normalization.
pub fn normalize_path<P: AsRef<Path>>(path: P) -> PathBuf {
    let ends_with_slash = path.as_ref()
        .to_str()
        .map_or(false, |s| s.ends_with('/'));
    let mut normalized = PathBuf::new();
    for component in path.as_ref().components() {
        match &component {
            Component::ParentDir => {
                if !normalized.pop() {
                    normalized.push(component);
                }
            }
            _ => {
                normalized.push(component);
            }
        }
    }
    if ends_with_slash {
        normalized.push("");
    }
    normalized
}

(this uses the directories crate to get the home in a cross-platform way but other crates exist and you could also just read the $HOME env variable in most platforms)

Donnenfeld answered 3/7, 2021 at 5:23 Comment(0)
F
0

There is a std::path::absolute() in std, and it is stabilized in Rust 1.79, which seems to be exactly what op wants.

As the accepted answer said, ~ is a shell feature, this function won't handle it.

Footrace answered 28/6, 2024 at 5:57 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.