Check whether a path is valid in Python without creating a file at the path's target
Asked Answered
C

6

144

I have a path (including directory and file name).
I need to test if the file-name is a valid, e.g. if the file-system will allow me to create a file with such a name.
The file-name has some unicode characters in it.

It's safe to assume the directory segment of the path is valid and accessible (I was trying to make the question more gnerally applicable, and apparently I wen too far).

I very much do not want to have to escape anything unless I have to.

I'd post some of the example characters I am dealing with, but apparently they get automatically removed by the stack-exchange system. Anyways, I want to keep standard unicode entities like ö, and only escape things which are invalid in a filename.


Here is the catch. There may (or may not) already be a file at the target of the path. I need to keep that file if it does exist, and not create a file if it does not.

Basically I want to check if I could write to a path without actually opening the path for writing (and the automatic file creation/file clobbering that typically entails).

As such:

try:
    open(filename, 'w')
except OSError:
    # handle error here

from here

Is not acceptable, because it will overwrite the existent file, which I do not want to touch (if it's there), or create said file if it's not.

I know I can do:

if not os.access(filePath, os.W_OK):
    try:
        open(filePath, 'w').close()
        os.unlink(filePath)
    except OSError:
        # handle error here

But that will create the file at the filePath, which I would then have to os.unlink.

In the end, it seems like it's spending 6 or 7 lines to do something that should be as simple as os.isvalidpath(filePath) or similar.


As an aside, I need this to run on (at least) Windows and MacOS, so I'd like to avoid platform-specific stuff.

``

Cismontane answered 2/3, 2012 at 11:26 Comment(12)
If you are wanting to test that the path exists and you can write to it, then simply create and delete some other file. Give it a unique name ( or as unique as you can), to avoid multi user / multi thread issues. Otherwise you are looking at checking out permssions which will drop you straight into the OS specific muddle.Outpost
@Tony Hopkinson - Basically I want to check if I could write to a path without actually writing anything.Cismontane
If you don't have anything to write to the file, then why do you need to know if you're able to?Coagulase
@Karl Knechtel - If I write to it, and there is already a file there, it will damage the existant file.Cismontane
@FakeName - You're always going to have a subtle race condition here. Between checking that the file doesn't exist but could be created, and then creating the file, some other process could create it and you'll clobber the file anyway. Of course, it depends on your usage whether this is a realistic problem or not...Seamanlike
@Seamanlike - Well, the only process that will create a file there would be mine. There may be a file there, from a significant period of time ago.Cismontane
@FakeName: did you ever find a decent solution for this that you can share?Ramah
@KennethHoste - Nope. That's why the question is still open.Cismontane
Partly you could check it with os.path.isabs(PATH), but that does not cover relative path :-(.Conspectus
@Conspectus - And is completely useless, if you had read any of the other answers. I'm trying to determine allowable filenames, which is both OS and filesystem dependent. isabs() verifies neither.Cismontane
This looks like an XY Question. Why would one want to know if a path is writable if they don't want to actually write it? A discussion on this topic (unfortunately, only Win related): bugs.python.org/issue36534.Surcease
This looks like an XY question because I didn't feel like writing 5 pages of boring content on exactly how I wound up needing exactly this functionality. It'd probably be better at this point to change the title to "How can you validate a filename against the local filesystem's charset/filename limitations without having to actually create a file".Cismontane
D
243

tl;dr

Call the is_path_exists_or_creatable() function defined below.

Strictly Python 3. That's just how we roll.

A Tale of Two Questions

The question of "How do I test pathname validity and, for valid pathnames, the existence or writability of those paths?" is clearly two separate questions. Both are interesting, and neither have received a genuinely satisfactory answer here... or, well, anywhere that I could grep.

vikki's answer probably hews the closest, but has the remarkable disadvantages of:

  • Needlessly opening (...and then failing to reliably close) file handles.
  • Needlessly writing (...and then failing to reliable close or delete) 0-byte files.
  • Ignoring OS-specific errors differentiating between non-ignorable invalid pathnames and ignorable filesystem issues. Unsurprisingly, this is critical under Windows. (See below.)
  • Ignoring race conditions resulting from external processes concurrently (re)moving parent directories of the pathname to be tested. (See below.)
  • Ignoring connection timeouts resulting from this pathname residing on stale, slow, or otherwise temporarily inaccessible filesystems. This could expose public-facing services to potential DoS-driven attacks. (See below.)

We're gonna fix all that.

Question #0: What's Pathname Validity Again?

Before hurling our fragile meat suits into the python-riddled moshpits of pain, we should probably define what we mean by "pathname validity." What defines validity, exactly?

By "pathname validity," we mean the syntactic correctness of a pathname with respect to the root filesystem of the current system – regardless of whether that path or parent directories thereof physically exist. A pathname is syntactically correct under this definition if it complies with all syntactic requirements of the root filesystem.

By "root filesystem," we mean:

  • On POSIX-compatible systems, the filesystem mounted to the root directory (/).
  • On Windows, the filesystem mounted to %HOMEDRIVE%, the colon-suffixed drive letter containing the current Windows installation (typically but not necessarily C:).

The meaning of "syntactic correctness," in turn, depends on the type of root filesystem. For ext4 (and most but not all POSIX-compatible) filesystems, a pathname is syntactically correct if and only if that pathname:

  • Contains no null bytes (i.e., \x00 in Python). This is a hard requirement for all POSIX-compatible filesystems.
  • Contains no path components longer than 255 bytes (e.g., 'a'*256 in Python). A path component is a longest substring of a pathname containing no / character (e.g., bergtatt, ind, i, and fjeldkamrene in the pathname /bergtatt/ind/i/fjeldkamrene).

Syntactic correctness. Root filesystem. That's it.

Question #1: How Now Shall We Do Pathname Validity?

Validating pathnames in Python is surprisingly non-intuitive. I'm in firm agreement with Fake Name here: the official os.path package should provide an out-of-the-box solution for this. For unknown (and probably uncompelling) reasons, it doesn't. Fortunately, unrolling your own ad-hoc solution isn't that gut-wrenching...

O.K., it actually is. It's hairy; it's nasty; it probably chortles as it burbles and giggles as it glows. But what you gonna do? Nuthin'.

We'll soon descend into the radioactive abyss of low-level code. But first, let's talk high-level shop. The standard os.stat() and os.lstat() functions raise the following exceptions when passed invalid pathnames:

  • For pathnames residing in non-existing directories, instances of FileNotFoundError.
  • For pathnames residing in existing directories:
    • Under Windows, instances of WindowsError whose winerror attribute is 123 (i.e., ERROR_INVALID_NAME).
    • Under all other OSes:
    • For pathnames containing null bytes (i.e., '\x00'), instances of TypeError.
    • For pathnames containing path components longer than 255 bytes, instances of OSError whose errcode attribute is:
      • Under SunOS and the *BSD family of OSes, errno.ERANGE. (This appears to be an OS-level bug, otherwise referred to as "selective interpretation" of the POSIX standard.)
      • Under all other OSes, errno.ENAMETOOLONG.

Crucially, this implies that only pathnames residing in existing directories are validatable. The os.stat() and os.lstat() functions raise generic FileNotFoundError exceptions when passed pathnames residing in non-existing directories, regardless of whether those pathnames are invalid or not. Directory existence takes precedence over pathname invalidity.

Does this mean that pathnames residing in non-existing directories are not validatable? Yes – unless we modify those pathnames to reside in existing directories. Is that even safely feasible, however? Shouldn't modifying a pathname prevent us from validating the original pathname?

To answer this question, recall from above that syntactically correct pathnames on the ext4 filesystem contain no path components (A) containing null bytes or (B) over 255 bytes in length. Hence, an ext4 pathname is valid if and only if all path components in that pathname are valid. This is true of most real-world filesystems of interest.

Does that pedantic insight actually help us? Yes. It reduces the larger problem of validating the full pathname in one fell swoop to the smaller problem of only validating all path components in that pathname. Any arbitrary pathname is validatable (regardless of whether that pathname resides in an existing directory or not) in a cross-platform manner by following the following algorithm:

  1. Split that pathname into path components (e.g., the pathname /troldskog/faren/vild into the list ['', 'troldskog', 'faren', 'vild']).
  2. For each such component:
    1. Join the pathname of a directory guaranteed to exist with that component into a new temporary pathname (e.g., /troldskog) .
    2. Pass that pathname to os.stat() or os.lstat(). If that pathname and hence that component is invalid, this call is guaranteed to raise an exception exposing the type of invalidity rather than a generic FileNotFoundError exception. Why? Because that pathname resides in an existing directory. (Circular logic is circular.)

Is there a directory guaranteed to exist? Yes, but typically only one: the topmost directory of the root filesystem (as defined above).

Passing pathnames residing in any other directory (and hence not guaranteed to exist) to os.stat() or os.lstat() invites race conditions, even if that directory was previously tested to exist. Why? Because external processes cannot be prevented from concurrently removing that directory after that test has been performed but before that pathname is passed to os.stat() or os.lstat(). Unleash the dogs of mind-fellating insanity!

There exists a substantial side benefit to the above approach as well: security. (Isn't that nice?) Specifically:

Front-facing applications validating arbitrary pathnames from untrusted sources by simply passing such pathnames to os.stat() or os.lstat() are susceptible to Denial of Service (DoS) attacks and other black-hat shenanigans. Malicious users may attempt to repeatedly validate pathnames residing on filesystems known to be stale or otherwise slow (e.g., NFS Samba shares); in that case, blindly statting incoming pathnames is liable to either eventually fail with connection timeouts or consume more time and resources than your feeble capacity to withstand unemployment.

The above approach obviates this by only validating the path components of a pathname against the root directory of the root filesystem. (If even that's stale, slow, or inaccessible, you've got larger problems than pathname validation.)

Lost? Great. Let's begin. (Python 3 assumed. See "What Is Fragile Hope for 300, leycec?")

import errno, os

# Sadly, Python fails to provide the following magic number for us.
ERROR_INVALID_NAME = 123
'''
Windows-specific error code indicating an invalid pathname.

See Also
----------
https://learn.microsoft.com/en-us/windows/win32/debug/system-error-codes--0-499-
    Official listing of all such codes.
'''

def is_pathname_valid(pathname: str) -> bool:
    '''
    `True` if the passed pathname is a valid pathname for the current OS;
    `False` otherwise.
    '''
    # If this pathname is either not a string or is but is empty, this pathname
    # is invalid.
    try:
        if not isinstance(pathname, str) or not pathname:
            return False

        # Strip this pathname's Windows-specific drive specifier (e.g., `C:\`)
        # if any. Since Windows prohibits path components from containing `:`
        # characters, failing to strip this `:`-suffixed prefix would
        # erroneously invalidate all valid absolute Windows pathnames.
        _, pathname = os.path.splitdrive(pathname)

        # Directory guaranteed to exist. If the current OS is Windows, this is
        # the drive to which Windows was installed (e.g., the "%HOMEDRIVE%"
        # environment variable); else, the typical root directory.
        root_dirname = os.environ.get('HOMEDRIVE', 'C:') \
            if sys.platform == 'win32' else os.path.sep
        assert os.path.isdir(root_dirname)   # ...Murphy and her ironclad Law

        # Append a path separator to this directory if needed.
        root_dirname = root_dirname.rstrip(os.path.sep) + os.path.sep

        # Test whether each path component split from this pathname is valid or
        # not, ignoring non-existent and non-readable path components.
        for pathname_part in pathname.split(os.path.sep):
            try:
                os.lstat(root_dirname + pathname_part)
            # If an OS-specific exception is raised, its error code
            # indicates whether this pathname is valid or not. Unless this
            # is the case, this exception implies an ignorable kernel or
            # filesystem complaint (e.g., path not found or inaccessible).
            #
            # Only the following exceptions indicate invalid pathnames:
            #
            # * Instances of the Windows-specific "WindowsError" class
            #   defining the "winerror" attribute whose value is
            #   "ERROR_INVALID_NAME". Under Windows, "winerror" is more
            #   fine-grained and hence useful than the generic "errno"
            #   attribute. When a too-long pathname is passed, for example,
            #   "errno" is "ENOENT" (i.e., no such file or directory) rather
            #   than "ENAMETOOLONG" (i.e., file name too long).
            # * Instances of the cross-platform "OSError" class defining the
            #   generic "errno" attribute whose value is either:
            #   * Under most POSIX-compatible OSes, "ENAMETOOLONG".
            #   * Under some edge-case OSes (e.g., SunOS, *BSD), "ERANGE".
            except OSError as exc:
                if hasattr(exc, 'winerror'):
                    if exc.winerror == ERROR_INVALID_NAME:
                        return False
                elif exc.errno in {errno.ENAMETOOLONG, errno.ERANGE}:
                    return False
    # If a "TypeError" exception was raised, it almost certainly has the
    # error message "embedded NUL character" indicating an invalid pathname.
    except TypeError as exc:
        return False
    # If no exception was raised, all path components and hence this
    # pathname itself are valid. (Praise be to the curmudgeonly python.)
    else:
        return True
    # If any other exception was raised, this is an unrelated fatal issue
    # (e.g., a bug). Permit this exception to unwind the call stack.
    #
    # Did we mention this should be shipped with Python already?

Done. Don't squint at that code. (It bites.)

Question #2: Possibly Invalid Pathname Existence or Creatability, Eh?

Testing the existence or creatability of possibly invalid pathnames is, given the above solution, mostly trivial. The little key here is to call the previously defined function before testing the passed path:

def is_path_creatable(pathname: str) -> bool:
    '''
    `True` if the current user has sufficient permissions to create the passed
    pathname; `False` otherwise.
    '''
    # Parent directory of the passed path. If empty, we substitute the current
    # working directory (CWD) instead.
    dirname = os.path.dirname(pathname) or os.getcwd()
    return os.access(dirname, os.W_OK)

def is_path_exists_or_creatable(pathname: str) -> bool:
    '''
    `True` if the passed pathname is a valid pathname for the current OS _and_
    either currently exists or is hypothetically creatable; `False` otherwise.

    This function is guaranteed to _never_ raise exceptions.
    '''
    try:
        # To prevent "os" module calls from raising undesirable exceptions on
        # invalid pathnames, is_pathname_valid() is explicitly called first.
        return is_pathname_valid(pathname) and (
            os.path.exists(pathname) or is_path_creatable(pathname))
    # Report failure on non-fatal filesystem complaints (e.g., connection
    # timeouts, permissions issues) implying this path to be inaccessible. All
    # other exceptions are unrelated fatal issues and should not be caught here.
    except OSError:
        return False

Done and done. Except not quite.

Question #3: Possibly Invalid Pathname Existence or Writability on Windows

There exists a caveat. Of course there does.

As the official os.access() documentation admits:

Note: I/O operations may fail even when os.access() indicates that they would succeed, particularly for operations on network filesystems which may have permissions semantics beyond the usual POSIX permission-bit model.

To no one's surprise, Windows is the usual suspect here. Thanks to extensive use of Access Control Lists (ACL) on NTFS filesystems, the simplistic POSIX permission-bit model maps poorly to the underlying Windows reality. While this (arguably) isn't Python's fault, it might nonetheless be of concern for Windows-compatible applications.

If this is you, a more robust alternative is wanted. If the passed path does not exist, we instead attempt to create a temporary file guaranteed to be immediately deleted in the parent directory of that path – a more portable (if expensive) test of creatability:

import os, tempfile

def is_path_sibling_creatable(pathname: str) -> bool:
    '''
    `True` if the current user has sufficient permissions to create **siblings**
    (i.e., arbitrary files in the parent directory) of the passed pathname;
    `False` otherwise.
    '''
    # Parent directory of the passed path. If empty, we substitute the current
    # working directory (CWD) instead.
    dirname = os.path.dirname(pathname) or os.getcwd()

    try:
        # For safety, explicitly close and hence delete this temporary file
        # immediately after creating it in the passed path's parent directory.
        with tempfile.TemporaryFile(dir=dirname): pass
        return True
    # While the exact type of exception raised by the above function depends on
    # the current version of the Python interpreter, all such types subclass the
    # following exception superclass.
    except EnvironmentError:
        return False

def is_path_exists_or_creatable_portable(pathname: str) -> bool:
    '''
    `True` if the passed pathname is a valid pathname on the current OS _and_
    either currently exists or is hypothetically creatable in a cross-platform
    manner optimized for POSIX-unfriendly filesystems; `False` otherwise.

    This function is guaranteed to _never_ raise exceptions.
    '''
    try:
        # To prevent "os" module calls from raising undesirable exceptions on
        # invalid pathnames, is_pathname_valid() is explicitly called first.
        return is_pathname_valid(pathname) and (
            os.path.exists(pathname) or is_path_sibling_creatable(pathname))
    # Report failure on non-fatal filesystem complaints (e.g., connection
    # timeouts, permissions issues) implying this path to be inaccessible. All
    # other exceptions are unrelated fatal issues and should not be caught here.
    except OSError:
        return False

Note, however, that even this may not be enough.

Thanks to User Access Control (UAC), the ever-inimicable Windows Vista and all subsequent iterations thereof blatantly lie about permissions pertaining to system directories. When non-Administrator users attempt to create files in either the canonical C:\Windows or C:\Windows\system32 directories, UAC superficially permits the user to do so while actually isolating all created files into a "Virtual Store" in that user's profile. (Who could have possibly imagined that deceiving users would have harmful long-term consequences?)

This is crazy. This is Windows.

Prove It

Dare we? It's time to test-drive the above tests.

Since NULL is the only character prohibited in pathnames on UNIX-oriented filesystems, let's leverage that to demonstrate the cold, hard truth – ignoring non-ignorable Windows shenanigans, which frankly bore and anger me in equal measure:

>>> print('"foo.bar" valid? ' + str(is_pathname_valid('foo.bar')))
"foo.bar" valid? True
>>> print('Null byte valid? ' + str(is_pathname_valid('\x00')))
Null byte valid? False
>>> print('Long path valid? ' + str(is_pathname_valid('a' * 256)))
Long path valid? False
>>> print('"/dev" exists or creatable? ' + str(is_path_exists_or_creatable('/dev')))
"/dev" exists or creatable? True
>>> print('"/dev/foo.bar" exists or creatable? ' + str(is_path_exists_or_creatable('/dev/foo.bar')))
"/dev/foo.bar" exists or creatable? False
>>> print('Null byte exists or creatable? ' + str(is_path_exists_or_creatable('\x00')))
Null byte exists or creatable? False

Beyond sanity. Beyond pain. You will find Python portability concerns.

Descender answered 5/12, 2015 at 8:26 Comment(17)
Yup, it was me! Attempting to kludge together a cross-portable pathname-validating regex is an exercise in futility and guaranteed to fail for common edge cases. Consider pathname length on Windows, for example: "The maximum path of 32,767 characters is approximate, because the '\\?\' prefix may be expanded to a longer string by the system at run time, and this expansion applies to the total length." Given that, it's actually technically infeasible to construct a regex matching only valid pathnames. It's much more reasonable just to defer to Python instead.Descender
Ah. I (reluctantly) see. You're doing something even stranger than hacking up a regex. Yeah, that is guaranteed to fail even harder. That also completely fails to address the question in question, which is not "How do I strip invalid substrings from a Windows-specific basename?" (...which, by your own omission, you fail to solve – again due to edge cases) but "How do I cross-portably test pathname validity and, for valid pathnames, the existence or writability of those paths?"Descender
I get your point about wanting to avoid slow network filesystem access but I'm afraid that changing the filesystem might actually give you incorrect results as different filesystems may have different limitations. Furthermore, I'd say that per SRP you should factor out the path-mutation code into its own function and provide is_path_valid as doing only the checks. Lastly, I find the name is_path_exists_or_creatable utterly unreadable, I would name it to path_exists_or_is_creatable or (less so) is_path_creatable_or_exists. Still, you got closest to answering the OP's question.Calceiform
Filesystem-specific constraints is definitely a valid concern – but it cuts both ways. For front-facing applications consuming arbitrary pathnames from untrusted sources, blindly performing reads is a dicey proposition at best; in this case, forcing root filesystem use is not only sensible but prudent. For other applications, however, the userbase may be trustworthy enough to grant uninhibited filesystem access. It's fairly context-dependent, I'd say. Thanks for astutely noting this, Nobody! I'll add a caveat above.Descender
Factoring path-munging out of is_pathname_valid() is absolutely a valid concern. (Like the rest of your well-thought critiques, of course.) Due to brain-gnawing stackoverflow fatigue, I may leave that as a reader exercise.Descender
As for nomenclature, I'm a pedantic fan of prefixing tester names by is_. This is my character flaw. Nonetheless, duly noted: you can't please everybody, and sometimes you can't please anybody. ;)Descender
This doesn't 100% work for me (Python 3.5.2) -- when lstat is given a long path it throws "ValueError: lstat: path too long for Windows" which doesn't get handled.Diane
On Fedora 24, python 3.5.3, a path name with an embedded null characters throws: ValueError: embedded null byte … need to add: ``` except ValueError as exc: return False ``` before or after the TypeError trap.Beastings
def is_path_sibling_creatable(pathname: str) -> bool: what is the colon and -> bool syntax? Is this pseudocode?Olander
is_pathname_valid(r'C:nice\one.txt') returns True but expected to return False, because C:nice\one.txt is not a valid path in Windows 7 (Python 3.7.2).Touraco
@DmitriyWork is_pathname_valid(r'C:nice\one.txt') returns True because os.path.splitdrive() is permissive of that path syntax and -- as far as I can tell -- behaves as if there's a slash between the drive letter (C:) and the relative drive path (nice\one.txt).Addictive
Just tested C:nice\one.txt path. Works fine in python (open(r'C:nice\one.txt')), but not in either of PowerShell (cat C:nice\one.txt), explorer or notepad (notepad C:nice\one.txt).Touraco
"Did we mention this should be shipped with Python already?" Agree. Have you considered making a PyPI package for this, @CecilCurry?Upstream
Found a bug with "Windows File Streams" (learn.microsoft.com/en-us/windows/win32/fileio/file-streams) involved. os.lstat(r'C:\a:b') raises FileNotFoundError since it's a valid path for a file stream; os.lstat('a:b') raises FileNotFoundError, same reason; then os.lstat(r'C:\a:b\xyz') raises WinError 123, because it's not a path for file stream, it's a path with parent dirname a:b which is invalid, so in this case os.lstat(path_part) isn't reliable; and os.lstat(r'Y:\a:b\xyz') # "Y" drive doesn't exists raises FileNotFoundError, because `Y:` dosen't exists.Motif
I modified your is_path_valid code: gist.github.com/mo-han/240b3ef008d96215e352203b88be40dbMotif
@MinhTran this is called type hinting which is a feature in python 3. The colon after the parameter means that this parameter should be this type, it does not enforce this, but is a way of telling the user what type they should enter. The -> bool means that the function will return a bool, again, the python interpreter will not enforce this, you could still return a string without an error, but it is a way of telling the user that this function should return a boolIntervale
Thank you so much for this excellent and detailed explanation. I have spent hours today trying to find this functionality in os, obviously without any avail. I needed your first function only, and it has to work on any OS, so this was an ideal find. I will have to adapt it to the API I'm working in, but it is incredible and works well, thank you.Loam
C
59
if os.path.exists(filePath):
    #the file is there
elif os.access(os.path.dirname(filePath), os.W_OK):
    #the file does not exists but write privileges are given
else:
    #can not write there

Note that path.exists can fail for more reasons than just the file is not there so you might have to do finer tests like testing if the containing directory exists and so on.


After my discussion with the OP it turned out, that the main problem seems to be, that the file name might contain characters that are not allowed by the filesystem. Of course they need to be removed but the OP wants to maintain as much human readablitiy as the filesystem allows.

Sadly I do not know of any good solution for this. However Cecil Curry's answer takes a closer look at detecting the problem.

Calceiform answered 2/3, 2012 at 11:33 Comment(15)
No. I need to return true if the file at the path exists, or can be created. I need to return false if the path in invalid (due to containing invalid characters on windows).Cismontane
or can be created well I did not read that from your question. Reading the permissions will be platfrom-dependent to some extent.Calceiform
Isn't the whole point of the os library to wrap all that stuff in a unified api? It seems like something that should already be available.Cismontane
@Fake Name: Yes it will remove some of the platformdependencies but still some platforms offer things that others do not and there is no easy way to wrap that for all of them. I updated my answer, have a look there.Calceiform
1) I think you mean os.access. There is no os.path.access. 2) os.access(filePath, os.W_OK) returns false if the file does not exist.Cismontane
>>> os.access("test.txt", os.W_OK) False >>> open("test.txt", "w") <open file 'test.txt', mode 'w' at 0x00000000023DA420> >>> os.access("test.txt", os.W_OK) True >>>Cismontane
@Fake Name: Sry I was a bit too fast with my solution. I updated it. Basically you need to check the containing directory for write privileges to test if you can create the file.Calceiform
That still won't work. One of the specific issues I am having is invalid characters in the filename. I can actually be pretty confident I have write permissions in the directory.Cismontane
Basically, I'm scraping web content, and creating directories and files for it with corresponding names. As it's web-content, unicode is screwing everything up. I would prefer to not escape everything, just anything that cannot be used as a valid file-path.Cismontane
let us continue this discussion in chatCalceiform
For example, consider a file with `` in it's title. This works in the browser, but not in the filesystem.Cismontane
@FakeName the issue of invalid characters in filename is not only os dependent but also filesystem dependentAmbo
And the commenting system ate the unicode in my comment above.Cismontane
I have no idea why this answer was upvoted. It doesn't come remotely adjacent to addressing the core question – which, succinctly, is: "Validate pathnames, please?" Validating path permissions is an ancillary (and largely ignorable) question here. While the call to os.path.exists(filePath) does technically raise exceptions on invalid pathnames, those exceptions would need to be explicitly caught and differentiated from other unrelated exceptions. Moreover, the same call returns False on existing paths to which the current user does not have read permissions. In short, badness.Descender
@CecilCurry: To answer your questions: Have a look at the edit history of the question. As with most questions it was not as clear cut in the beginning and even now the wording of the title alone might be understood otherwise than you said.Calceiform
I
14

I found a PyPI module called pathvalidate.

pip install pathvalidate

Inside it has a function called sanitize_filepath which will take a file path and convert it into a valid file path:

from pathvalidate import sanitize_filepath
file1 = "ap:lle/fi:le"
print(sanitize_filepath(file1))
# Output: "apple/file"

It also works with reserved names too. If you feed it the file path con, it will return con_.

With this knowledge, we can check if the entered file path is equal to the sanitized one and that will mean the file path is valid.

import os
from pathvalidate import sanitize_filepath

def check(filePath):
    if os.path.exists(filePath):
        return True
    if filePath == sanitize_filepath(filePath):
        return True
    return False
Intervale answered 16/4, 2021 at 5:50 Comment(4)
This is a good answer. But, I am being able to install pathvalidate, but it is showing errors.Otherwise
what errors. I downloaded it fine with python 3.8Intervale
Don't forget to include platform enum(I have yet to find an enum in the documentation) for sanitize_filepath(platform='xyz')Pooch
nice, elegant solution. Small typo in the code, should read: if os.path.exists(filePath):Distinctive
E
11

With Python 3, how about:

try:
    with open(filename, 'x') as tempfile: # OSError if file exists or is invalid
        pass
except OSError:
    # handle error here

With the 'x' option we also don't have to worry about race conditions. See documentation here.

Now, this WILL create a very shortlived temporary file if it does not exist already - unless the name is invalid. If you can live with that, it simplifies things a lot.

Eighteenth answered 29/1, 2018 at 10:26 Comment(8)
At this point, the project that needed this has moved so far beyond the point where an answer is even relevant that I can't really accept an answer.Cismontane
Ironically the practical answer is not good enough. Regardless I suppose you could see if the file existed. If it does, attempt to copy the file elsewhere, and then try to overwrite.Hammons
This also only validates file names, not folder names. It may also have side-effects and security consequences.Witness
@kfsone, please elaborate or at least provide a link with documentation for your claims about side effects and security concerns.Eighteenth
@StephenMiller The OP asked how to validate a filename. Your solution is open the file, or create it if it doesn't exist. This will fail for a slew of non-filename related reasons that don't answer "is this a legal filename": ownership, path length, disk capacity, etc, etc. The security issues are that while attempting to validate user-input data you are performing file operations with input without regard to what filesystem you are on, and whoever uses this code should be aware that your suggestion fails to consider permissions/ownership.Witness
@StephenMiller The OP didn't say they wanted to create anything, infact in edits they painstakingly expressed that they want to avoid creating things. They just want to validate user input. I get what you think you are saying, but it's not a solution to the question asked, and on a unix filesystem, users could accidentally or maliciously use it to perform all sorts of shenanigans from creating lock files to disrupting file-sensitive applications to sidechannel attacks. It also opens the file in write mode.Witness
@StephenMiller Lastly, the 'except' part of your code glosses over how complicated it is to use this as an effective test. What cases do you expect it to produce? What does it mean if you get FileNotFoundError on Windows? What does it mean if you get that on MacOS? What does it mean if the open never returns?Witness
I mean that to be an elaboration as an honest response to your request, taken at face value. In the effort of fitting it into tweet-sized chunks I apologize for any apparent tone or attitude that may have gotten from poor word-wrangling on my part.Witness
A
6
open(filename,'r')   #2nd argument is r and not w

will open the file or give an error if it doesn't exist. If there's an error, then you can try to write to the path, if you can't then you get a second error

try:
    open(filename,'r')
    return True
except IOError:
    try:
        open(filename, 'w')
        return True
    except IOError:
        return False

Also have a look here about permissions on windows

Ambo answered 2/3, 2012 at 11:57 Comment(7)
To avoid the need to explicitly unlink() the test file, you can use tempfile.TemporaryFile() which will automatically destroy the tempfile when it goes out of scope.Casablanca
@FakeName The code is different, I could have used os.access on the second part but if you followed the link I gave you'd have seen that it's not a good idea, this leaves you with the option of trying to actually open the path for writing.Ambo
I'm building my paths with os.path.join, so I don't have `\` escaping issues. Furthermore, I'm not really having directory permission issues. I'm having directory (and filename) name issues.Cismontane
@FakeName in that case you only need to try and open it(you don't need to write), python gives an error if the filename contains invalid characters. I've edited the answerAmbo
It just seems like a 7 line solution to what should be a one-line problem.Cismontane
Why do you need to try to open the path first? I think opening for writing and closing right away works just fine.Gag
@HelgaIliashenko Opening for writing will overwrite an existing file (make it empty) even if you close it immediately without writing to it. That's why I was opening for reading first because that way, if you don't get an error then you know that there is an existing file.Ambo
C
-9

try os.path.exists this will check for the path and return True if exists and False if not.

Christo answered 2/3, 2012 at 11:33 Comment(4)
No. I need to return true if the file at the path exists, or can be created. I need to return false if the path in invalid (due to containing invalid characters on windows).Cismontane
which type of invalid character?Christo
Dunno - that's platform specific.Cismontane
File system specific, actually.Inconceivable

© 2022 - 2024 — McMap. All rights reserved.