How do I check for illegal characters in a path?
Asked Answered
T

9

42

Is there a way to check if a String meant for a path has invalid characters, in .Net? I know I could iterate over each character in Path.InvalidPathChars to see if my String contained one, but I'd prefer a simple, perhaps more formal, solution.

Is there one?

I've found I still get an exception if I only check against Get

Update:

I've found GetInvalidPathChars does not cover every invalid path character. GetInvalidFileNameChars has 5 more, including '?', which I've come across. I'm going to switch to that, and I'll report back if it, too, proves to be inadequate.

Update 2:

GetInvalidFileNameChars is definitely not what I want. It contains ':', which any absolute path is going to contain ("C:\whatever"). I think I'm just going to have to use GetInvalidPathChars after all, and add in '?' and any other characters that cause me problems as they come up. Better solutions welcome.

Thalamencephalon answered 12/3, 2010 at 21:4 Comment(6)
Why is it tagged with "regex"?Sharilyn
I'm not sure. Magnifico added it.Thalamencephalon
Edited my original post in response to Update 1 and 2.Averroism
Isn't this a duplicate of #146634?Pneumatics
? is a valid character in a Windows path, as it can refer to an MS-DOS device name. How often you'll encounter such in the wild is another thing entirely though...Chauncey
FYI: in .NET 4.0 on Windows, Path.GetInvalidPathChars() is a subset of Path.GetInvalidFilenameChars(). To be precise, Path.GetInvalidFilenameChars() == Path.GetInvalidPathChars().Concat(new[] { ':', '*', '?', '\\', '\' })Chauncey
A
50

InvalidPathChars is deprecated. Use GetInvalidPathChars() instead:

    public static bool FilePathHasInvalidChars(string path)
    {

        return (!string.IsNullOrEmpty(path) && path.IndexOfAny(System.IO.Path.GetInvalidPathChars()) >= 0);
    }

Edit: Slightly longer, but handles path vs file invalid chars in one function:

    // WARNING: Not tested
    public static bool FilePathHasInvalidChars(string path)
    {
        bool ret = false;
        if(!string.IsNullOrEmpty(path))
        {
            try
            {
                // Careful!
                //    Path.GetDirectoryName("C:\Directory\SubDirectory")
                //    returns "C:\Directory", which may not be what you want in
                //    this case. You may need to explicitly add a trailing \
                //    if path is a directory and not a file path. As written, 
                //    this function just assumes path is a file path.
                string fileName = System.IO.Path.GetFileName(path);
                string fileDirectory = System.IO.Path.GetDirectoryName(path);

                // we don't need to do anything else,
                                    // if we got here without throwing an 
                                    // exception, then the path does not
                                    // contain invalid characters
            }
            catch (ArgumentException)
            {
                                    // Path functions will throw this 
                                    // if path contains invalid chars
                ret = true;
            }
        }
        return ret;
    }
Averroism answered 12/3, 2010 at 21:13 Comment(5)
I'm tired now (3AM) but methinks that IndexOfAny returns -1 if no invalid char is found, thus the result is true if NO such char is found in either filename or fileDirectory, exactly the opposite of what is wanted. But, more importantly, how does this solve "c:\first\second:third\test.txt"? Would it catch the second, illegal ':'?Tollefson
See edits to original post. As to your other question, "C:\first\second:third\test.txt" does not contain any invalid characters for a path, since ":" is a valid path character. True, the path is an invalid path, but the purpose of the function was not to validate proper paths. For that, the best bet would be to test the path string against a regular expression. You could also do: foreach(String s in path.Split('\\')) {// test s for invalid file characters} but that implementation is a little brittle since you have to make an exception for the "C:"Averroism
The second function does not seem to catch ? or * characters.Wortham
Might be good to cache Path.GetInvalidPathChars() since it will be cloned with every call to GetInvalidPathChars.Shlomo
Ive noticed Path.GetDirectoryName can be quite slow when u give it an invalid path.Rhodie
I
11

As of .NET 4.7.2, Path.GetInvalidFileNameChars() reports the following 41 'bad' characters.

0x0000    0      '\0'   |    0x000d   13      '\r'   |    0x001b   27  '\u001b'
0x0001    1  '\u0001'   |    0x000e   14  '\u000e'   |    0x001c   28  '\u001c'
0x0002    2  '\u0002'   |    0x000f   15  '\u000f'   |    0x001d   29  '\u001d'
0x0003    3  '\u0003'   |    0x0010   16  '\u0010'   |    0x001e   30  '\u001e'
0x0004    4  '\u0004'   |    0x0011   17  '\u0011'   |    0x001f   31  '\u001f'
0x0005    5  '\u0005'   |    0x0012   18  '\u0012'   |    0x0022   34       '"'
0x0006    6  '\u0006'   |    0x0013   19  '\u0013'   |    0x002a   42       '*'
0x0007    7      '\a'   |    0x0014   20  '\u0014'   |    0x002f   47       '/'
0x0008    8      '\b'   |    0x0015   21  '\u0015'   |    0x003a   58       ':'
0x0009    9      '\t'   |    0x0016   22  '\u0016'   |    0x003c   60       '<'
0x000a   10      '\n'   |    0x0017   23  '\u0017'   |    0x003e   62       '>'
0x000b   11      '\v'   |    0x0018   24  '\u0018'   |    0x003f   63       '?'
0x000c   12      '\f'   |    0x0019   25  '\u0019'   |    0x005c   92      '\\'
                        |    0x001a   26  '\u001a'   |    0x007c  124       '|'

As noted by another poster, this is a proper superset of the set of characters returned by Path.GetInvalidPathChars().

The following function detects the exact set of 41 characters shown above:

public static bool IsInvalidFileNameChar(Char c) => c < 64U ?
        (1UL << c & 0xD4008404FFFFFFFFUL) != 0 :
        c == '\\' || c == '|';
Issiah answered 4/9, 2018 at 0:39 Comment(1)
This is still not enough, a perfectly valid character in path name may still combine into an invalid path or file name such as AUX, COM1, LPT1Oblation
O
9

Be careful when relying on Path.GetInvalidFileNameChars, which may not be as reliable as you'd think. Notice the following remark in the MSDN documentation on Path.GetInvalidFileNameChars:

The array returned from this method is not guaranteed to contain the complete set of characters that are invalid in file and directory names. The full set of invalid characters can vary by file system. For example, on Windows-based desktop platforms, invalid path characters might include ASCII/Unicode characters 1 through 31, as well as quote ("), less than (<), greater than (>), pipe (|), backspace (\b), null (\0) and tab (\t).

It's not any better with Path.GetInvalidPathChars method. It contains the exact same remark.

Offset answered 16/11, 2011 at 13:34 Comment(1)
The GetInvalid*NameChars methods are neither useful nor reliable. Path validity/invalidity is implicitly tied to the filesystem on which the code is executing, and since System.IO.* doesn't do filesystem sniffing - just returns a hard-coded array - what is invalid on filesystem A may be completely valid on filesystem B. tl;dr: don't rely on these methods, roll your own.Chauncey
S
5

I ended up borrowing and combining a few internal .NET implementations to come up with a performant method:

/// <summary>Determines if the path contains invalid characters.</summary>
/// <remarks>This method is intended to prevent ArgumentException's from being thrown when creating a new FileInfo on a file path with invalid characters.</remarks>
/// <param name="filePath">File path.</param>
/// <returns>True if file path contains invalid characters.</returns>
private static bool ContainsInvalidPathCharacters(string filePath)
{
    for (var i = 0; i < filePath.Length; i++)
    {
        int c = filePath[i];

        if (c == '\"' || c == '<' || c == '>' || c == '|' || c == '*' || c == '?' || c < 32)
            return true;
    }

    return false;
}

I then used it like so but also wrapped it up in a try/catch block for safety:

if ( !string.IsNullOrWhiteSpace(path) && !ContainsInvalidPathCharacters(path))
{
    FileInfo fileInfo = null;

    try
    {
        fileInfo = new FileInfo(path);
    }
    catch (ArgumentException)
    {            
    }

    ...
}
Sled answered 8/12, 2015 at 5:55 Comment(0)
K
2

It's probably too late for you, but may help somebody else. I faced the same issue and needed to find a reliable way to sanitize a path.

Here is what I ended up using, in 3 steps:

Step 1: Custom cleaning.

public static string RemoveSpecialCharactersUsingCustomMethod(this string expression, bool removeSpecialLettersHavingASign = true)
{
    var newCharacterWithSpace = " ";
    var newCharacter = "";

    // Return carriage handling
    // ASCII LINE-FEED character (LF),
    expression = expression.Replace("\n", newCharacterWithSpace);
    // ASCII CARRIAGE-RETURN character (CR) 
    expression = expression.Replace("\r", newCharacterWithSpace);

    // less than : used to redirect input, allowed in Unix filenames, see Note 1
    expression = expression.Replace(@"<", newCharacter);
    // greater than : used to redirect output, allowed in Unix filenames, see Note 1
    expression = expression.Replace(@">", newCharacter);
    // colon: used to determine the mount point / drive on Windows; 
    // used to determine the virtual device or physical device such as a drive on AmigaOS, RT-11 and VMS; 
    // used as a pathname separator in classic Mac OS. Doubled after a name on VMS, 
    // indicates the DECnet nodename (equivalent to a NetBIOS (Windows networking) hostname preceded by "\\".). 
    // Colon is also used in Windows to separate an alternative data stream from the main file.
    expression = expression.Replace(@":", newCharacter);
    // quote : used to mark beginning and end of filenames containing spaces in Windows, see Note 1
    expression = expression.Replace(@"""", newCharacter);
    // slash : used as a path name component separator in Unix-like, Windows, and Amiga systems. 
    // (The MS-DOS command.com shell would consume it as a switch character, but Windows itself always accepts it as a separator.[16][vague])
    expression = expression.Replace(@"/", newCharacter);
    // backslash : Also used as a path name component separator in MS-DOS, OS/2 and Windows (where there are few differences between slash and backslash); allowed in Unix filenames, see Note 1
    expression = expression.Replace(@"\", newCharacter);
    // vertical bar or pipe : designates software pipelining in Unix and Windows; allowed in Unix filenames, see Note 1
    expression = expression.Replace(@"|", newCharacter);
    // question mark : used as a wildcard in Unix, Windows and AmigaOS; marks a single character. Allowed in Unix filenames, see Note 1
    expression = expression.Replace(@"?", newCharacter);
    expression = expression.Replace(@"!", newCharacter);
    // asterisk or star : used as a wildcard in Unix, MS-DOS, RT-11, VMS and Windows. Marks any sequence of characters 
    // (Unix, Windows, later versions of MS-DOS) or any sequence of characters in either the basename or extension 
    // (thus "*.*" in early versions of MS-DOS means "all files". Allowed in Unix filenames, see note 1
    expression = expression.Replace(@"*", newCharacter);
    // percent : used as a wildcard in RT-11; marks a single character.
    expression = expression.Replace(@"%", newCharacter);
    // period or dot : allowed but the last occurrence will be interpreted to be the extension separator in VMS, MS-DOS and Windows. 
    // In other OSes, usually considered as part of the filename, and more than one period (full stop) may be allowed. 
    // In Unix, a leading period means the file or folder is normally hidden.
    expression = expression.Replace(@".", newCharacter);
    // space : allowed (apart MS-DOS) but the space is also used as a parameter separator in command line applications. 
    // This can be solved by quoting, but typing quotes around the name every time is inconvenient.
    //expression = expression.Replace(@"%", " ");
    expression = expression.Replace(@"  ", newCharacter);

    if (removeSpecialLettersHavingASign)
    {
        // Because then issues to zip
        // More at : http://www.thesauruslex.com/typo/eng/enghtml.htm
        expression = expression.Replace(@"ê", "e");
        expression = expression.Replace(@"ë", "e");
        expression = expression.Replace(@"ï", "i");
        expression = expression.Replace(@"œ", "oe");
    }

    return expression;
}

Step 2: Check any invalid characters not yet removed.

A an extra verification step, I use the Path.GetInvalidPathChars() method posted above to detect any potential invalid characters not yet removed.

public static bool ContainsAnyInvalidCharacters(this string path)
{
    return (!string.IsNullOrEmpty(path) && path.IndexOfAny(Path.GetInvalidPathChars()) >= 0);
}

Step 3: Clean any special characters detected in Step 2.

And finally, I use this method as final step to clean anything left. (from How to remove illegal characters from path and filenames?):

public static string RemoveSpecialCharactersUsingFrameworkMethod(this string path)
{
    return Path.GetInvalidFileNameChars().Aggregate(path, (current, c) => current.Replace(c.ToString(), string.Empty));
}

I log any invalid character not cleaned in the first step. I choose to go that way to improve my custom method as soon as a 'leak' is detected. I can't rely on the Path.GetInvalidFileNameChars() because of the following statement a reported above (from MSDN):

"The array returned from this method is not guaranteed to contain the complete set of characters that are invalid in file and directory names. "

It may not be the ideal solution, but given the context of my application and the level of reliability required, this is the best solution I found.

Kirby answered 9/11, 2015 at 12:18 Comment(1)
In the part regarding replacing double spaces to single space, shouldn't we do a loop to continuously replace all double spaces with single space, until there is no double space left? " " will become " ", which should ideally become " ".Antionetteantioxidant
C
1

I recommend using a HashSet for this to increase efficiency:

private static HashSet<char> _invalidCharacters = new HashSet<char>(Path.GetInvalidPathChars());

Then you can simply check that the string isn't null/empty and that there aren't any invalid characters:

public static bool IsPathValid(string filePath)
{
    return !string.IsNullOrEmpty(filePath) && !filePath.Any(pc => _invalidCharacters.Contains(pc));
}

Try it online

Cyrene answered 26/11, 2019 at 4:34 Comment(1)
Since the number of invalid characters is typically very finite (~40), iterating over it will probably not significantly impact efficiency, especially compared to the I/O operations that are presumably involved when dealing with file names.Poker
C
1

Simple and as correct as it can be considering MS documentation:

bool IsPathValid(String path)
{
    for (int i = 0; i < path.Length; ++i)
        if (Path.GetInvalidFileNameChars().Contains(path[i]))
            return false
    return true;
}
Commute answered 8/7, 2020 at 0:35 Comment(0)
H
1

Just for reference the framework has internal methods that do this - but unfortunately they are marked internal.

For reference here are the relevant bits, which are similar to the accepted answer here.

internal static bool HasIllegalCharacters(string path, bool checkAdditional = false) => (AppContextSwitches.UseLegacyPathHandling || !PathInternal.IsDevice(path)) && PathInternal.AnyPathHasIllegalCharacters(path, checkAdditional);

    internal static bool AnyPathHasIllegalCharacters(string path, bool checkAdditional = false)
    {
      if (path.IndexOfAny(PathInternal.InvalidPathChars) >= 0)
        return true;
      return checkAdditional && PathInternal.AnyPathHasWildCardCharacters(path);
    }

    internal static bool HasWildCardCharacters(string path)
    {
      int startIndex = AppContextSwitches.UseLegacyPathHandling ? 0 : (PathInternal.IsDevice(path) ? "\\\\?\\".Length : 0);
      return PathInternal.AnyPathHasWildCardCharacters(path, startIndex);
    }

    internal static bool AnyPathHasWildCardCharacters(string path, int startIndex = 0)
    {
      for (int index = startIndex; index < path.Length; ++index)
      {
        switch (path[index])
        {
          case '*':
          case '?':    
            return true;
          default:
            continue;
        }
      }
      return false;
    }
Hauck answered 4/11, 2020 at 22:23 Comment(0)
I
0

I'm also too late. But if the task is to validate if user entered something valid as path, there is a combined solution for paths.

Path.GetInvalidFileNameChars() returns list of characters illegal for file, but the directory follows the file's rules except the separators (which we could get from system) and the root specifier (C:, we can just remove it from search). Yes, Path.GetInvalidFileNameChars() returns not the complete set, but it is better than try to find all of them manually.

So:

private static bool CheckInvalidPath(string targetDir)
{
  string root;
  try
  {
    root = Path.GetPathRoot(targetDir);
  }
  catch
  {
    // the path is definitely invalid if it has crashed
    return false;
  }

  // of course it is better to cache it as it creates
  // new array on each call
  char[] chars = Path.GetInvalidFileNameChars();

  // ignore root
  for (int i = root.Length; i < targetDir.Length; i++)
  {
    char c = targetDir[i];

    // separators are allowed
    if (c == Path.DirectorySeparatorChar || c == Path.AltDirectorySeparatorChar)
      continue;

    // check for illegal chars
    for (int j = 0; j < chars.Length; j++)
      if (c == chars[j])
        return false;
  }

  return true;
}

I've found that methods like Path.GetFileName will not crash for paths like C:\* (which is completely invalid) and even exception-based check is not enough. The only thing which will crash the Path.GetPathRoot is invalid root (like CC:\someDir). So everything other should be done manually.

Iodide answered 4/7, 2018 at 15:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.