How to remove illegal characters from path and filenames?
Asked Answered
E

30

602

I need a robust and simple way to remove illegal path and file characters from a simple string. I've used the below code but it doesn't seem to do anything, what am I missing?

using System;
using System.IO;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            string illegal = "\"M<>\"\\a/ry/ h**ad:>> a\\/:*?\"<>| li*tt|le|| la\"mb.?";

            illegal = illegal.Trim(Path.GetInvalidFileNameChars());
            illegal = illegal.Trim(Path.GetInvalidPathChars());

            Console.WriteLine(illegal);
            Console.ReadLine();
        }
    }
}
Envision answered 28/9, 2008 at 15:52 Comment(6)
Trim removes characters from the beginning and end of strings. However, you probably should ask why the data is invalid, and rather than try and sanitize/fix the data, reject the data.Rumen
Unix style names are not valid on Windows and i don't want to deal with 8.3 shortnames.Envision
GetInvalidFileNameChars() will strip things like : \ etc from folder paths.Addington
Path.GetInvalidPathChars() doesn't seem to strip * or ?Addington
I tested five answers from this question (timed loop of 100,000) and the following method is the fastest. The regular expression took 2nd place, and was 25% slower : public string GetSafeFilename(string filename) { return string.Join("_", filename.Split(Path.GetInvalidFileNameChars())); }Inkwell
I added a new fast alternative, and some benchmarks in this answerColeridge
C
555

Try something like this instead;

string illegal = "\"M\"\\a/ry/ h**ad:>> a\\/:*?\"| li*tt|le|| la\"mb.?";
string invalid = new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars());

foreach (char c in invalid)
{
    illegal = illegal.Replace(c.ToString(), ""); 
}

But I have to agree with the comments, I'd probably try to deal with the source of the illegal paths, rather than try to mangle an illegal path into a legitimate but probably unintended one.

Edit: Or a potentially 'better' solution, using Regex's.

string illegal = "\"M\"\\a/ry/ h**ad:>> a\\/:*?\"| li*tt|le|| la\"mb.?";
string regexSearch = new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars());
Regex r = new Regex(string.Format("[{0}]", Regex.Escape(regexSearch)));
illegal = r.Replace(illegal, "");

Still, the question begs to be asked, why you're doing this in the first place.

Creigh answered 28/9, 2008 at 16:3 Comment(22)
I don't know if I should +1 your answer for having such an ill-performing solution that will push the user away from that path, or if I should +1 your answer for it actually answering his question! :)Rumen
@Michael Stum: they get 'compiled' and should be some sort of state machine, but it would be naive to assume they are guaranteed to be any more efficient under the hood than a loop.Rumen
On something the length of a path, it probably wouldn't make that much of a difference. On a longer string, I imagine the regex would be faster though.Creigh
I'd stick to the non-regex solution: it's likely to be more efficient most of the time. If using the regex solution, change string.Format() to just "["+"...". If you're going to treat illegal as a file name without path after replacing special chars then you'd only need Path.InvalidFileNameChars().Slosberg
It's not necessary to append the two lists together. The illegal file name char list contains the illegal path char list and has a few more. Here are lists of both lists cast to int: 34,60,62,124,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,58,42,63,92,47 34,60,62,124,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31Dill
@sjbotha this may be true on Windows and Microsoft's implementation of .NET I'm not willing to make the same assumption for say mono running Linux.Creigh
Regarding the first solution. Shouldn't a StringBuilder be more efficient than the string assignments?Lappet
If the string contains Chinese characters, the solution could fail.Nd
@PerlDev: Have you actually tested that? characters should be multi-byte compatible (sizeof(char) == 2), so it shouldn't be an issue. The regex solution should be fine also.Creigh
What's the problem with sanitization, Bob Tables?Kacikacie
Correct me if I'm wrong, but calling both Path.GetInvalidFileNameChars() and Path.GetInvalidPathChars() is superfluous. Path.GetInvalidFileNameChars() alone should be sufficient.Asti
@JoeyAdams: see my reply to Sarel Botha. In short, one is a superset of the other on Windows. Personally, I'm not willing to make the same bet cross platform and C# and .NET in general is getting a wider and wider audience via Mono all the time.Creigh
For what it's worth, @MatthewScharley, the Mono implementation of GetInvalidPathChars() returns only 0x00 and GetInvalidFileNameChars() returns only 0x00 and '/' when running on non-Windows platforms. On Windows, the lists of invalid characters is much longer, and GetInvalidPathChars() is entirely duplicated inside GetInvalidFileNameChars(). This isn't going to change in the forseeable future, so all you're really doing is doubling the amount of time this function takes to run because you're worried that the definition of a valid path will change sometime soon. Which it won't.Orthotropic
And let's be super-clear about this: This part of the Mono source code hasn't changed in EIGHT YEARS except for a minor perf improvement in 2007.Orthotropic
@Warren: Feel free to dedupe the resultant string if you really are worried, but lets be perfectly honest here: The difference between 20 and 40 iterations against a string the length of your average path (lets say 100 characters to be generous) will make exactly no difference to the runtime of your function. For all practical purposes, there's no need to worry about it. On the other hand, these two functions do serve different purposes and (in my mind at least), it would be perfectly reasonable for one function to not return a superset of the other for some given file system.Creigh
How can doing double the work (whether it's deduplicating the array, or running through almost precisely the same array values twice) take "exactly no difference"? You know as well as I do that this is incorrect, so -don't- -say- -it-. We're trying to be an educational resource at Stackoverflow, not a place for rhetorical flourishes prompted by being told you're wrong. Let's be clear: What you're recommending here is effectively the same as the old Daily WTF canard about providing your own definition of TRUE and FALSE because you can't trust the compiler or libraries to always get it right.Orthotropic
GetInvalidFileNameChars() is always -- ALWAYS, you hear me -- going to include everything in GetInvalidPathChars() because it isn't possible for a file to have a character in that isn't valid in a path name. No file system allows this today, no file system ever will. And anyways, Microsoft's own documentation for these functions is very clear in stating that you should not expect the list of characters to be guaranteed as accurate because file systems might support something different anyways.Orthotropic
I'd probably side with Matthew here and just say that assumption is the mother of all mess ups. You are talking about optimising code which probably doesn't need optimizing over potential correctness. I'd take the correctness over the premature optimisation any dayUnroot
@Unroot this discussion is so unnecessary... code should always be optimized and there is no risk of this to be incorrect. A filename is a part of the path, too. So it is just illogical that GetInvalidPathChars() could contain characters that GetInvalidFileNameChars() wouldn't. You are not taking correctness over "premature" optimisation. You are simply using bad code.Kiley
Personally i would prefer this way: var invalid = Path.GetInvalidFileNameChars().Union(Path.GetInvalidPathChars()); foreach(char c in invalid) illegal = illegal.Replace(c.ToString(), "_");Duly
I'm not sure why you guys are so nosy about why he wants to use it. There are various legit scenarios where this would be useful. Our app for example outputs xlsx files to email as reports and if we don't validate it on entry, you won't know until the scheduled time of creation of the report that the filename was invalid. We've had issues where in the past someone accidently entered a less-than in the filename and saved it. Plus some of our clients run linux and some run windows so the allowed files aren't the same.Norland
@JohnLord another common use case is dealing with filenames coming in from outside emails. You cannot control the file name being sent to you. You can, of course, throw away the original and replace it with something of your own devising, but there are cases where you want to retain as much of the original as possible for AI purposes.Runagate
N
618

The original question asked to "remove illegal characters":

public string RemoveInvalidChars(string filename)
{
    return string.Concat(filename.Split(Path.GetInvalidFileNameChars()));
}

You may instead want to replace them:

public string ReplaceInvalidChars(string filename)
{
    return string.Join("_", filename.Split(Path.GetInvalidFileNameChars()));    
}

This answer was on another thread by Ceres, I really like it neat and simple.

Nihi answered 20/4, 2014 at 13:6 Comment(4)
To precisely answer the OP's question, you would need to use "" instead of "_", but your answer probably applies to more of us in practice. I think replacing illegal characters with some legal one is more commonly done.Rhyner
I tested five methods from this question (timed loop of 100,000) and this method is the fastest one. The regular expression took 2nd place, and was 25% slower than this method.Inkwell
To address @BH 's comment, one can simply use string.Concat(name.Split(Path.GetInvalidFileNameChars()))Leoraleos
Suprisingly the Split/Join code is about as fast as a foreach loop, it has the same performance.Conjugal
C
555

Try something like this instead;

string illegal = "\"M\"\\a/ry/ h**ad:>> a\\/:*?\"| li*tt|le|| la\"mb.?";
string invalid = new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars());

foreach (char c in invalid)
{
    illegal = illegal.Replace(c.ToString(), ""); 
}

But I have to agree with the comments, I'd probably try to deal with the source of the illegal paths, rather than try to mangle an illegal path into a legitimate but probably unintended one.

Edit: Or a potentially 'better' solution, using Regex's.

string illegal = "\"M\"\\a/ry/ h**ad:>> a\\/:*?\"| li*tt|le|| la\"mb.?";
string regexSearch = new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars());
Regex r = new Regex(string.Format("[{0}]", Regex.Escape(regexSearch)));
illegal = r.Replace(illegal, "");

Still, the question begs to be asked, why you're doing this in the first place.

Creigh answered 28/9, 2008 at 16:3 Comment(22)
I don't know if I should +1 your answer for having such an ill-performing solution that will push the user away from that path, or if I should +1 your answer for it actually answering his question! :)Rumen
@Michael Stum: they get 'compiled' and should be some sort of state machine, but it would be naive to assume they are guaranteed to be any more efficient under the hood than a loop.Rumen
On something the length of a path, it probably wouldn't make that much of a difference. On a longer string, I imagine the regex would be faster though.Creigh
I'd stick to the non-regex solution: it's likely to be more efficient most of the time. If using the regex solution, change string.Format() to just "["+"...". If you're going to treat illegal as a file name without path after replacing special chars then you'd only need Path.InvalidFileNameChars().Slosberg
It's not necessary to append the two lists together. The illegal file name char list contains the illegal path char list and has a few more. Here are lists of both lists cast to int: 34,60,62,124,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,58,42,63,92,47 34,60,62,124,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31Dill
@sjbotha this may be true on Windows and Microsoft's implementation of .NET I'm not willing to make the same assumption for say mono running Linux.Creigh
Regarding the first solution. Shouldn't a StringBuilder be more efficient than the string assignments?Lappet
If the string contains Chinese characters, the solution could fail.Nd
@PerlDev: Have you actually tested that? characters should be multi-byte compatible (sizeof(char) == 2), so it shouldn't be an issue. The regex solution should be fine also.Creigh
What's the problem with sanitization, Bob Tables?Kacikacie
Correct me if I'm wrong, but calling both Path.GetInvalidFileNameChars() and Path.GetInvalidPathChars() is superfluous. Path.GetInvalidFileNameChars() alone should be sufficient.Asti
@JoeyAdams: see my reply to Sarel Botha. In short, one is a superset of the other on Windows. Personally, I'm not willing to make the same bet cross platform and C# and .NET in general is getting a wider and wider audience via Mono all the time.Creigh
For what it's worth, @MatthewScharley, the Mono implementation of GetInvalidPathChars() returns only 0x00 and GetInvalidFileNameChars() returns only 0x00 and '/' when running on non-Windows platforms. On Windows, the lists of invalid characters is much longer, and GetInvalidPathChars() is entirely duplicated inside GetInvalidFileNameChars(). This isn't going to change in the forseeable future, so all you're really doing is doubling the amount of time this function takes to run because you're worried that the definition of a valid path will change sometime soon. Which it won't.Orthotropic
And let's be super-clear about this: This part of the Mono source code hasn't changed in EIGHT YEARS except for a minor perf improvement in 2007.Orthotropic
@Warren: Feel free to dedupe the resultant string if you really are worried, but lets be perfectly honest here: The difference between 20 and 40 iterations against a string the length of your average path (lets say 100 characters to be generous) will make exactly no difference to the runtime of your function. For all practical purposes, there's no need to worry about it. On the other hand, these two functions do serve different purposes and (in my mind at least), it would be perfectly reasonable for one function to not return a superset of the other for some given file system.Creigh
How can doing double the work (whether it's deduplicating the array, or running through almost precisely the same array values twice) take "exactly no difference"? You know as well as I do that this is incorrect, so -don't- -say- -it-. We're trying to be an educational resource at Stackoverflow, not a place for rhetorical flourishes prompted by being told you're wrong. Let's be clear: What you're recommending here is effectively the same as the old Daily WTF canard about providing your own definition of TRUE and FALSE because you can't trust the compiler or libraries to always get it right.Orthotropic
GetInvalidFileNameChars() is always -- ALWAYS, you hear me -- going to include everything in GetInvalidPathChars() because it isn't possible for a file to have a character in that isn't valid in a path name. No file system allows this today, no file system ever will. And anyways, Microsoft's own documentation for these functions is very clear in stating that you should not expect the list of characters to be guaranteed as accurate because file systems might support something different anyways.Orthotropic
I'd probably side with Matthew here and just say that assumption is the mother of all mess ups. You are talking about optimising code which probably doesn't need optimizing over potential correctness. I'd take the correctness over the premature optimisation any dayUnroot
@Unroot this discussion is so unnecessary... code should always be optimized and there is no risk of this to be incorrect. A filename is a part of the path, too. So it is just illogical that GetInvalidPathChars() could contain characters that GetInvalidFileNameChars() wouldn't. You are not taking correctness over "premature" optimisation. You are simply using bad code.Kiley
Personally i would prefer this way: var invalid = Path.GetInvalidFileNameChars().Union(Path.GetInvalidPathChars()); foreach(char c in invalid) illegal = illegal.Replace(c.ToString(), "_");Duly
I'm not sure why you guys are so nosy about why he wants to use it. There are various legit scenarios where this would be useful. Our app for example outputs xlsx files to email as reports and if we don't validate it on entry, you won't know until the scheduled time of creation of the report that the filename was invalid. We've had issues where in the past someone accidently entered a less-than in the filename and saved it. Plus some of our clients run linux and some run windows so the allowed files aren't the same.Norland
@JohnLord another common use case is dealing with filenames coming in from outside emails. You cannot control the file name being sent to you. You can, of course, throw away the original and replace it with something of your own devising, but there are cases where you want to retain as much of the original as possible for AI purposes.Runagate
B
220

I use Linq to clean up filenames. You can easily extend this to check for valid paths as well.

private static string CleanFileName(string fileName)
{
    return Path.GetInvalidFileNameChars().Aggregate(fileName, (current, c) => current.Replace(c.ToString(), string.Empty));
}

Update

Some comments indicate this method is not working for them so I've included a link to a DotNetFiddle snippet so you may validate the method.

https://dotnetfiddle.net/nw1SWY

Boswell answered 12/9, 2011 at 20:38 Comment(6)
This did not work for me. The method is not returning the clean string. It is returning the passed filename as it is.Paucity
What @Paucity said, this does not work, the original string comes back.Teece
You can actually do this with Linq like this though: var invalid = new HashSet<char>(Path.GetInvalidPathChars()); return new string(originalString.Where(s => !invalid.Contains(s)).ToArray()). Performance probably isn't great but that probably doesn't matter.Tallia
@Paucity or Jon What input are you sending this function? See my edit for verification of this method.Boswell
It's easy - guys were passing strings with valid chars. Upvoted for cool Aggregate solution.Vigilante
Very good solution but only cleans up filename (as stated) but not the actual path as it is considering "\" as an illegal character and if you have something like "\\MyServer\e$\demo\Output\Test\1111_joe_soap.pdf", it returns "MyServere$demoOutputTest1111_joe_soap.pdf"Gamb
S
93

You can remove illegal chars using Linq like this:

var invalidChars = Path.GetInvalidFileNameChars();

var invalidCharsRemoved = stringWithInvalidChars
.Where(x => !invalidChars.Contains(x))
.ToArray();

EDIT
This is how it looks with the required edit mentioned in the comments:

var invalidChars = Path.GetInvalidFileNameChars();

string invalidCharsRemoved = new string(stringWithInvalidChars
  .Where(x => !invalidChars.Contains(x))
  .ToArray());
Slider answered 24/11, 2010 at 19:41 Comment(6)
I like this way : you keep only the allowed chars in the string (which is nothing else than a char array).Popularly
I know that this is an old question, but this is an awesome answer. However, I wanted to add that in c# you cannot cast from char[] to string either implicitly or explicitly (crazy, I know) so you'll need to drop it into a string constructor.Conservatism
I haven't confirmed this, but I expect Path.GetInvalidPathChars() to be a superset of GetInvalidFileNameChars() and to cover both filenames and paths, so I would probably use that instead.Kannry
@anjdreas actually Path.GetInvalidPathChars() seems to be a subset of Path.GetInvalidFileNameChars(), not the other way round. Path.GetInvalidPathChars() will not return '?', for example.Sterner
This is a good answer. I use both the filename list and the filepath list: ____________________________ string cleanData = new string(data.Where(x => !Path.GetInvalidFileNameChars().Contains(x) && !Path.GetInvalidPathChars().Contains(x)).ToArray());Dantzler
You can also do var invalidChars = new HashSet<char>(Path.GetInvalidFileNameChars()) and make it O(n) instead of O(n^2). No reason why not.Westsouthwest
W
44

For file names:

var cleanFileName = string.Join("", fileName.Split(Path.GetInvalidFileNameChars()));

For full paths:

var cleanPath = string.Join("", path.Split(Path.GetInvalidPathChars()));

Note that if you intend to use this as a security feature, a more robust approach would be to expand all paths and then verify that the user supplied path is indeed a child of a directory the user should have access to.

Wellbeloved answered 11/2, 2014 at 2:36 Comment(0)
B
32

These are all great solutions, but they all rely on Path.GetInvalidFileNameChars, which may not be as reliable as you'd think. Notice the following remark in the MSDN documentation on Path.GetInvalidFileNameChars:

The array returned from this method is not guaranteed to contain the complete set of characters that are invalid in file and directory names. The full set of invalid characters can vary by file system. For example, on Windows-based desktop platforms, invalid path characters might include ASCII/Unicode characters 1 through 31, as well as quote ("), less than (<), greater than (>), pipe (|), backspace (\b), null (\0) and tab (\t).

It's not any better with Path.GetInvalidPathChars method. It contains the exact same remark.

Blotch answered 16/11, 2011 at 13:22 Comment(6)
Then what is the point of Path.GetInvalidFileNameChars? I would expect it to return exactly the invalid characters for the current system, relying on .NET to know which filesystem I'm running on and presenting me the fitting invalid chars. If this is not the case and it just returns hardcoded characters, which are not reliable in the first place, this method should be removed since it has zero value.Ousel
I know this is a old comment but, @Ousel you could want to write on another filesystem, maybe this is why there is a warning.Lungworm
@Lungworm good point, but in this case I would want to have an additional enum argument to specify my remote FS. If this is too much maintenance effort (which is most likely case), this whole method is still a bad idea, because it gives you the wrong impression of safety.Ousel
@Ousel I totally agree with you, I was just arguing about the warning.Lungworm
Interestingly this is a sort of "blacklisting" invalid chars. Would it not be better to "whitelist" only the known valid chars here?! Reminds me of the stupid "virusscanner" idea instead of whitelisting allowed apps....Shultz
pay attention to the fact about filenames in the warning. It's actually telling you that it's not validating filenames themselves, just illegal characters. You could still have an illegal filename that is a reserved word. Also how would you whitelist an app? I would just make my virus have your filename and signature.Norland
R
21

The best way to remove illegal character from user input is to replace illegal character using Regex class, create method in code behind or also it validate at client side using RegularExpression control.

public string RemoveSpecialCharacters(string str)
{
    return Regex.Replace(str, "[^a-zA-Z0-9_]+", "_", RegexOptions.Compiled);
}

OR

<asp:RegularExpressionValidator ID="regxFolderName" 
                                runat="server" 
                                ErrorMessage="Enter folder name with  a-z A-Z0-9_" 
                                ControlToValidate="txtFolderName" 
                                Display="Dynamic" 
                                ValidationExpression="^[a-zA-Z0-9_]*$" 
                                ForeColor="Red">
Roadhouse answered 28/9, 2013 at 6:35 Comment(2)
IMHO this solution is much better than others Instead of searching for all invalid chars just define which are valid.Anne
For POSIX "Fully portable filenames", use "[^a-zA-Z0-9_.-]+"Valida
R
18

For starters, Trim only removes characters from the beginning or end of the string. Secondly, you should evaluate if you really want to remove the offensive characters, or fail fast and let the user know their filename is invalid. My choice is the latter, but my answer should at least show you how to do things the right AND wrong way:

StackOverflow question showing how to check if a given string is a valid file name. Note you can use the regex from this question to remove characters with a regular expression replacement (if you really need to do this).

Rumen answered 28/9, 2008 at 15:56 Comment(2)
I especially agree with the second advice.Proprietress
I would normally agree with the second, but I have a program which generates a filename and which may contain illegal characters in some situations. Since my program is generating the illegal filenames, I think it's appropriate to remove/replace those characters. (Just pointing out a valid use-case)Magpie
B
15

I use regular expressions to achieve this. First, I dynamically build the regex.

string regex = string.Format(
                   "[{0}]",
                   Regex.Escape(new string(Path.GetInvalidFileNameChars())));
Regex removeInvalidChars = new Regex(regex, RegexOptions.Singleline | RegexOptions.Compiled | RegexOptions.CultureInvariant);

Then I just call removeInvalidChars.Replace to do the find and replace. This can obviously be extended to cover path chars as well.

Bonds answered 28/9, 2008 at 18:45 Comment(5)
Strange, it has been working for me. I'll double-check it when I get chance. Can you be more specific and explain what exactly isn't working for you?Bonds
It won't work (properly at the very least) because you aren't escaping the path characters properly, and some of them have a special meaning. Refer to my answer for how to do that.Creigh
@Jeff: Your version is still better than Matthew's, if you slightly modify it. Refer to my answer on how.Ousel
I would also add some other invalid file name patterns that can be found on MSDN and extend your solution to the following regex: new Regex(String.Format("^(CON|PRN|AUX|NUL|CLOCK\$|COM[1-9]|LPT[1-9])(?=\..|$)|(^(\.+|\s+)$)|((\.+|\s+)$)|([{0}])", Regex.Escape(new String(Path.GetInvalidFileNameChars()))), RegexOptions.Compiled | RegexOptions.Singleline | RegexOptions.CultureInvariant);Balaam
Small syntax improvement for @yar_shukan comment: Add @ before string expression, if you faced with error "Unrecognized escape sequence", i.e. String.Format(@"^CON| ... )"Doering
O
13

I absolutely prefer the idea of Jeff Yates. It will work perfectly, if you slightly modify it:

string regex = String.Format("[{0}]", Regex.Escape(new string(Path.GetInvalidFileNameChars())));
Regex removeInvalidChars = new Regex(regex, RegexOptions.Singleline | RegexOptions.Compiled | RegexOptions.CultureInvariant);

The improvement is just to escape the automaticially generated regex.

Ousel answered 15/2, 2011 at 14:21 Comment(0)
E
12

Here's a code snippet that should help for .NET 3 and higher.

using System.IO;
using System.Text.RegularExpressions;

public static class PathValidation
{
    private static string pathValidatorExpression = "^[^" + string.Join("", Array.ConvertAll(Path.GetInvalidPathChars(), x => Regex.Escape(x.ToString()))) + "]+$";
    private static Regex pathValidator = new Regex(pathValidatorExpression, RegexOptions.Compiled);

    private static string fileNameValidatorExpression = "^[^" + string.Join("", Array.ConvertAll(Path.GetInvalidFileNameChars(), x => Regex.Escape(x.ToString()))) + "]+$";
    private static Regex fileNameValidator = new Regex(fileNameValidatorExpression, RegexOptions.Compiled);

    private static string pathCleanerExpression = "[" + string.Join("", Array.ConvertAll(Path.GetInvalidPathChars(), x => Regex.Escape(x.ToString()))) + "]";
    private static Regex pathCleaner = new Regex(pathCleanerExpression, RegexOptions.Compiled);

    private static string fileNameCleanerExpression = "[" + string.Join("", Array.ConvertAll(Path.GetInvalidFileNameChars(), x => Regex.Escape(x.ToString()))) + "]";
    private static Regex fileNameCleaner = new Regex(fileNameCleanerExpression, RegexOptions.Compiled);

    public static bool ValidatePath(string path)
    {
        return pathValidator.IsMatch(path);
    }

    public static bool ValidateFileName(string fileName)
    {
        return fileNameValidator.IsMatch(fileName);
    }

    public static string CleanPath(string path)
    {
        return pathCleaner.Replace(path, "");
    }

    public static string CleanFileName(string fileName)
    {
        return fileNameCleaner.Replace(fileName, "");
    }
}
Eparch answered 19/10, 2010 at 16:33 Comment(0)
G
8

Most solutions above combine illegal chars for both path and filename which is wrong (even when both calls currently return the same set of chars). I would first split the path+filename in path and filename, then apply the appropriate set to either if them and then combine the two again.

wvd_vegt

Gujarati answered 19/6, 2012 at 12:16 Comment(2)
+1: Very true. Today, working in .NET 4.0, the regex solution from the top answer nuked all backslashes in a full path. So I made a regex for the dir path and a regex for just the filename, cleaned separately and recombinedDonny
That might be true but this doesn't answer the question. I'm not sure a vague 'I'd do it like this' is terribly helpful compared to some of the complete solutions already in here (see for example Lilly's answer, below)Gile
B
6

If you remove or replace with a single character the invalid characters, you can have collisions:

<abc -> abc
>abc -> abc

Here is a simple method to avoid this:

public static string ReplaceInvalidFileNameChars(string s)
{
    char[] invalidFileNameChars = System.IO.Path.GetInvalidFileNameChars();
    foreach (char c in invalidFileNameChars)
        s = s.Replace(c.ToString(), "[" + Array.IndexOf(invalidFileNameChars, c) + "]");
    return s;
}

The result:

 <abc -> [1]abc
 >abc -> [2]abc
Barrada answered 1/10, 2014 at 18:40 Comment(0)
D
6

This seems to be O(n) and does not spend too much memory on strings:

    private static readonly HashSet<char> invalidFileNameChars = new HashSet<char>(Path.GetInvalidFileNameChars());

    public static string RemoveInvalidFileNameChars(string name)
    {
        if (!name.Any(c => invalidFileNameChars.Contains(c))) {
            return name;
        }

        return new string(name.Where(c => !invalidFileNameChars.Contains(c)).ToArray());
    }
Dynatron answered 9/2, 2015 at 21:19 Comment(4)
I don't think it's O(n) when you use the 'Any' function.Disturb
@IIARROWS and what is it in your opinion?Dynatron
I don't know, it just didn't felt like that when I wrote my comment... now that I tried to calculate it, looks like you're right.Disturb
I selected this one because of your performance consideration. Thanks.Gombosi
E
5

Throw an exception.

if ( fileName.IndexOfAny(Path.GetInvalidFileNameChars()) > -1 )
            {
                throw new ArgumentException();
            }
Euphemia answered 12/3, 2009 at 16:14 Comment(1)
I don't think throwing an exception is valuable here as the question states about removing the offending characters, not simply throwing an exception.Calen
G
5

File name can not contain characters from Path.GetInvalidPathChars(), + and # symbols, and other specific names. We combined all checks into one class:

public static class FileNameExtensions
{
    private static readonly Lazy<string[]> InvalidFileNameChars =
        new Lazy<string[]>(() => Path.GetInvalidPathChars()
            .Union(Path.GetInvalidFileNameChars()
            .Union(new[] { '+', '#' })).Select(c => c.ToString(CultureInfo.InvariantCulture)).ToArray());


    private static readonly HashSet<string> ProhibitedNames = new HashSet<string>
    {
        @"aux",
        @"con",
        @"clock$",
        @"nul",
        @"prn",

        @"com1",
        @"com2",
        @"com3",
        @"com4",
        @"com5",
        @"com6",
        @"com7",
        @"com8",
        @"com9",

        @"lpt1",
        @"lpt2",
        @"lpt3",
        @"lpt4",
        @"lpt5",
        @"lpt6",
        @"lpt7",
        @"lpt8",
        @"lpt9"
    };

    public static bool IsValidFileName(string fileName)
    {
        return !string.IsNullOrWhiteSpace(fileName)
            && fileName.All(o => !IsInvalidFileNameChar(o))
            && !IsProhibitedName(fileName);
    }

    public static bool IsProhibitedName(string fileName)
    {
        return ProhibitedNames.Contains(fileName.ToLower(CultureInfo.InvariantCulture));
    }

    private static string ReplaceInvalidFileNameSymbols([CanBeNull] this string value, string replacementValue)
    {
        if (value == null)
        {
            return null;
        }

        return InvalidFileNameChars.Value.Aggregate(new StringBuilder(value),
            (sb, currentChar) => sb.Replace(currentChar, replacementValue)).ToString();
    }

    public static bool IsInvalidFileNameChar(char value)
    {
        return InvalidFileNameChars.Value.Contains(value.ToString(CultureInfo.InvariantCulture));
    }

    public static string GetValidFileName([NotNull] this string value)
    {
        return GetValidFileName(value, @"_");
    }

    public static string GetValidFileName([NotNull] this string value, string replacementValue)
    {
        if (string.IsNullOrWhiteSpace(value))
        {
            throw new ArgumentException(@"value should be non empty", nameof(value));
        }

        if (IsProhibitedName(value))
        {
            return (string.IsNullOrWhiteSpace(replacementValue) ? @"_" : replacementValue) + value; 
        }

        return ReplaceInvalidFileNameSymbols(value, replacementValue);
    }

    public static string GetFileNameError(string fileName)
    {
        if (string.IsNullOrWhiteSpace(fileName))
        {
            return CommonResources.SelectReportNameError;
        }

        if (IsProhibitedName(fileName))
        {
            return CommonResources.FileNameIsProhibited;
        }

        var invalidChars = fileName.Where(IsInvalidFileNameChar).Distinct().ToArray();

        if(invalidChars.Length > 0)
        {
            return string.Format(CultureInfo.CurrentCulture,
                invalidChars.Length == 1 ? CommonResources.InvalidCharacter : CommonResources.InvalidCharacters,
                StringExtensions.JoinQuoted(@",", @"'", invalidChars.Select(c => c.ToString(CultureInfo.CurrentCulture))));
        }

        return string.Empty;
    }
}

Method GetValidFileName replaces all incorrect data to _.

Gingras answered 25/7, 2018 at 12:14 Comment(0)
A
5

If you have to use the method in many places in a project, you could also make an extension method and call it anywhere in the project for strings.

 public static class StringExtension
    {
        public static string RemoveInvalidChars(this string originalString)
        {            
            string finalString=string.Empty;
            if (!string.IsNullOrEmpty(originalString))
            {
                return string.Concat(originalString.Split(Path.GetInvalidFileNameChars()));
            }
            return finalString;            
        }
    }

You can call the above extension method as:

string illegal = "\"M<>\"\\a/ry/ h**ad:>> a\\/:*?\"<>| li*tt|le|| la\"mb.?";
string afterIllegalChars = illegal.RemoveInvalidChars();
Angelika answered 4/2, 2021 at 20:1 Comment(1)
Because every string is a path. Or why would it make sense to extend string for just one special case?Jiggerypokery
S
4

I wrote this monster for fun, it lets you roundtrip:

public static class FileUtility
{
    private const char PrefixChar = '%';
    private static readonly int MaxLength;
    private static readonly Dictionary<char,char[]> Illegals;
    static FileUtility()
    {
        List<char> illegal = new List<char> { PrefixChar };
        illegal.AddRange(Path.GetInvalidFileNameChars());
        MaxLength = illegal.Select(x => ((int)x).ToString().Length).Max();
        Illegals = illegal.ToDictionary(x => x, x => ((int)x).ToString("D" + MaxLength).ToCharArray());
    }

    public static string FilenameEncode(string s)
    {
        var builder = new StringBuilder();
        char[] replacement;
        using (var reader = new StringReader(s))
        {
            while (true)
            {
                int read = reader.Read();
                if (read == -1)
                    break;
                char c = (char)read;
                if(Illegals.TryGetValue(c,out replacement))
                {
                    builder.Append(PrefixChar);
                    builder.Append(replacement);
                }
                else
                {
                    builder.Append(c);
                }
            }
        }
        return builder.ToString();
    }

    public static string FilenameDecode(string s)
    {
        var builder = new StringBuilder();
        char[] buffer = new char[MaxLength];
        using (var reader = new StringReader(s))
        {
            while (true)
            {
                int read = reader.Read();
                if (read == -1)
                    break;
                char c = (char)read;
                if (c == PrefixChar)
                {
                    reader.Read(buffer, 0, MaxLength);
                    var encoded =(char) ParseCharArray(buffer);
                    builder.Append(encoded);
                }
                else
                {
                    builder.Append(c);
                }
            }
        }
        return builder.ToString();
    }

    public static int ParseCharArray(char[] buffer)
    {
        int result = 0;
        foreach (char t in buffer)
        {
            int digit = t - '0';
            if ((digit < 0) || (digit > 9))
            {
                throw new ArgumentException("Input string was not in the correct format");
            }
            result *= 10;
            result += digit;
        }
        return result;
    }
}
Sheepskin answered 7/12, 2013 at 13:21 Comment(1)
I like this because it avoids having two different strings creating the same resulting path.Julee
L
3

I think it is much easier to validate using a regex and specifiing which characters are allowed, instead of trying to check for all bad characters. See these links: http://www.c-sharpcorner.com/UploadFile/prasad_1/RegExpPSD12062005021717AM/RegExpPSD.aspx http://www.windowsdevcenter.com/pub/a/oreilly/windows/news/csharp_0101.html

Also, do a search for "regular expression editor"s, they help a lot. There are some around which even output the code in c# for you.

Listel answered 28/9, 2008 at 16:7 Comment(2)
Given that .net is a framework that is intended to allow programs to run on multiple platforms (e.g. Linux/Unix as well as Windows), I feel Path.GetInvalidFileNameChars() is best since it will contain the knowledge of what is or isn't valid on the filesystem your program is being run on. Even if your program will never run on Linux (maybe it's full of WPF code), there's always the chance some new Windows filesystem will come along in the future and have different valid/invalid chars. Rolling your own with regex is reinventing the wheel, and shifting a platform issue into your own code.Tonsure
I agree with your advice on online regex editors/testers though. I find them invaluable (since regexes are tricky things, and full of subtlety that can trip you up easily, giving you a regex that behaves in some wildly unexpected way with edge cases). My favourite is regex101.com (I like how it breaks the regex down and shows you clearly what it expects to match). I also quite like debuggex.com as it's got a compact visual representation of match groups and character classes and whatnot.Tonsure
T
3

Scanning over the answers here, they all** seem to involve using a char array of invalid filename characters.

Granted, this may be micro-optimising - but for the benefit of anyone who might be looking to check a large number of values for being valid filenames, it's worth noting that building a hashset of invalid chars will bring about notably better performance.

I have been very surprised (shocked) in the past just how quickly a hashset (or dictionary) outperforms iterating over a list. With strings, it's a ridiculously low number (about 5-7 items from memory). With most other simple data (object references, numbers etc) the magic crossover seems to be around 20 items.

There are 40 invalid characters in the Path.InvalidFileNameChars "list". Did a search today and there's quite a good benchmark here on StackOverflow that shows the hashset will take a little over half the time of an array/list for 40 items: https://mcmap.net/q/65596/-hashset-vs-list-performance

Here's the helper class I use for sanitising paths. I forget now why I had the fancy replacement option in it, but it's there as a cute bonus.

Additional bonus method "IsValidLocalPath" too :)

(** those which don't use regular expressions)

public static class PathExtensions
{
    private static HashSet<char> _invalidFilenameChars;
    private static HashSet<char> InvalidFilenameChars
    {
        get { return _invalidFilenameChars ?? (_invalidFilenameChars = new HashSet<char>(Path.GetInvalidFileNameChars())); }
    }


    /// <summary>Replaces characters in <c>text</c> that are not allowed in file names with the 
    /// specified replacement character.</summary>
    /// <param name="text">Text to make into a valid filename. The same string is returned if 
    /// it is valid already.</param>
    /// <param name="replacement">Replacement character, or NULL to remove bad characters.</param>
    /// <param name="fancyReplacements">TRUE to replace quotes and slashes with the non-ASCII characters ” and ⁄.</param>
    /// <returns>A string that can be used as a filename. If the output string would otherwise be empty, "_" is returned.</returns>
    public static string ToValidFilename(this string text, char? replacement = '_', bool fancyReplacements = false)
    {
        StringBuilder sb = new StringBuilder(text.Length);
        HashSet<char> invalids = InvalidFilenameChars;
        bool changed = false;

        for (int i = 0; i < text.Length; i++)
        {
            char c = text[i];
            if (invalids.Contains(c))
            {
                changed = true;
                char repl = replacement ?? '\0';
                if (fancyReplacements)
                {
                    if (c == '"') repl = '”'; // U+201D right double quotation mark
                    else if (c == '\'') repl = '’'; // U+2019 right single quotation mark
                    else if (c == '/') repl = '⁄'; // U+2044 fraction slash
                }
                if (repl != '\0')
                    sb.Append(repl);
            }
            else
                sb.Append(c);
        }

        if (sb.Length == 0)
            return "_";

        return changed ? sb.ToString() : text;
    }


    /// <summary>
    /// Returns TRUE if the specified path is a valid, local filesystem path.
    /// </summary>
    /// <param name="pathString"></param>
    /// <returns></returns>
    public static bool IsValidLocalPath(this string pathString)
    {
        // From solution at https://mcmap.net/q/65597/-in-c-check-that-filename-is-possibly-valid-not-that-it-exists-duplicate
        Uri pathUri;
        Boolean isValidUri = Uri.TryCreate(pathString, UriKind.Absolute, out pathUri);
        return isValidUri && pathUri != null && pathUri.IsLoopback;
    }
}
Tonsure answered 8/9, 2017 at 0:14 Comment(0)
C
3

Here is my small contribution. A method to replace within the same string without creating new strings or stringbuilders. It's fast, easy to understand and a good alternative to all mentions in this post.

private static HashSet<char> _invalidCharsHash;
private static HashSet<char> InvalidCharsHash
{
  get { return _invalidCharsHash ?? (_invalidCharsHash = new HashSet<char>(Path.GetInvalidFileNameChars())); }
}

private static string ReplaceInvalidChars(string fileName, string newValue)
{
  char newChar = newValue[0];

  char[] chars = fileName.ToCharArray();
  for (int i = 0; i < chars.Length; i++)
  {
    char c = chars[i];
    if (InvalidCharsHash.Contains(c))
      chars[i] = newChar;
  }

  return new string(chars);
}

You can call it like this:

string illegal = "\"M<>\"\\a/ry/ h**ad:>> a\\/:*?\"<>| li*tt|le|| la\"mb.?";
string legal = ReplaceInvalidChars(illegal);

and returns:

_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._

It's worth to note that this method will always replace invalid chars with a given value, but will not remove them. If you want to remove invalid chars, this alternative will do the trick:

private static string RemoveInvalidChars(string fileName, string newValue)
{
  char newChar = string.IsNullOrEmpty(newValue) ? char.MinValue : newValue[0];
  bool remove = newChar == char.MinValue;

  char[] chars = fileName.ToCharArray();
  char[] newChars = new char[chars.Length];
  int i2 = 0;
  for (int i = 0; i < chars.Length; i++)
  {
    char c = chars[i];
    if (InvalidCharsHash.Contains(c))
    {
      if (!remove)
        newChars[i2++] = newChar;
    }
    else
      newChars[i2++] = c;

  }

  return new string(newChars, 0, i2);
}

BENCHMARK

I executed timed test runs with most methods found in this post, if performance is what you are after. Some of these methods don't replace with a given char, since OP was asking to clean the string. I added tests replacing with a given char, and some others replacing with an empty char if your intended scenario only needs to remove the unwanted chars. Code used for this benchmark is at the end, so you can run your own tests.

Note: Methods Test1 and Test2 are both proposed in this post.

First Run

replacing with '_', 1000000 iterations

Results:

============Test1===============
Elapsed=00:00:01.6665595
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._

============Test2===============
Elapsed=00:00:01.7526835
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._

============Test3===============
Elapsed=00:00:05.2306227
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._

============Test4===============
Elapsed=00:00:14.8203696
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._

============Test5===============
Elapsed=00:00:01.8273760
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._

============Test6===============
Elapsed=00:00:05.4249985
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._

============Test7===============
Elapsed=00:00:07.5653833
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._

============Test8===============
Elapsed=00:12:23.1410106
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._

============Test9===============
Elapsed=00:00:02.1016708
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._

============Test10===============
Elapsed=00:00:05.0987225
Result=M ary had a little lamb.

============Test11===============
Elapsed=00:00:06.8004289
Result=M ary had a little lamb.

Second Run

removing invalid chars, 1000000 iterations

Note: Test1 will not remove, only replace.

Results:

============Test1===============
Elapsed=00:00:01.6945352
Result= M     a ry  h  ad    a          li tt le   la mb.

============Test2===============
Elapsed=00:00:01.4798049
Result=M ary had a little lamb.

============Test3===============
Elapsed=00:00:04.0415688
Result=M ary had a little lamb.

============Test4===============
Elapsed=00:00:14.3397960
Result=M ary had a little lamb.

============Test5===============
Elapsed=00:00:01.6782505
Result=M ary had a little lamb.

============Test6===============
Elapsed=00:00:04.9251707
Result=M ary had a little lamb.

============Test7===============
Elapsed=00:00:07.9562379
Result=M ary had a little lamb.

============Test8===============
Elapsed=00:12:16.2918943
Result=M ary had a little lamb.

============Test9===============
Elapsed=00:00:02.0770277
Result=M ary had a little lamb.

============Test10===============
Elapsed=00:00:05.2721232
Result=M ary had a little lamb.

============Test11===============
Elapsed=00:00:05.2802903
Result=M ary had a little lamb.

BENCHMARK RESULTS

Methods Test1, Test2 and Test5 are the fastest. Method Test8 is the slowest.

CODE

Here's the complete code of the benchmark:

private static HashSet<char> _invalidCharsHash;
private static HashSet<char> InvalidCharsHash
{
  get { return _invalidCharsHash ?? (_invalidCharsHash = new HashSet<char>(Path.GetInvalidFileNameChars())); }
}

private static string _invalidCharsValue;
private static string InvalidCharsValue
{
  get { return _invalidCharsValue ?? (_invalidCharsValue = new string(Path.GetInvalidFileNameChars())); }
}

private static char[] _invalidChars;
private static char[] InvalidChars
{
  get { return _invalidChars ?? (_invalidChars = Path.GetInvalidFileNameChars()); }
}

static void Main(string[] args)
{
  string testPath = "\"M <>\"\\a/ry/ h**ad:>> a\\/:*?\"<>| li*tt|le|| la\"mb.?";

  int max = 1000000;
  string newValue = "";

  TimeBenchmark(max, Test1, testPath, newValue);
  TimeBenchmark(max, Test2, testPath, newValue);
  TimeBenchmark(max, Test3, testPath, newValue);
  TimeBenchmark(max, Test4, testPath, newValue);
  TimeBenchmark(max, Test5, testPath, newValue);
  TimeBenchmark(max, Test6, testPath, newValue);
  TimeBenchmark(max, Test7, testPath, newValue);
  TimeBenchmark(max, Test8, testPath, newValue);
  TimeBenchmark(max, Test9, testPath, newValue);
  TimeBenchmark(max, Test10, testPath, newValue);
  TimeBenchmark(max, Test11, testPath, newValue);

  Console.Read();
}

private static void TimeBenchmark(int maxLoop, Func<string, string, string> func, string testString, string newValue)
{
  var sw = new Stopwatch();
  sw.Start();
  string result = string.Empty;

  for (int i = 0; i < maxLoop; i++)
    result = func?.Invoke(testString, newValue);

  sw.Stop();

  Console.WriteLine($"============{func.Method.Name}===============");
  Console.WriteLine("Elapsed={0}", sw.Elapsed);
  Console.WriteLine("Result={0}", result);
  Console.WriteLine("");
}

private static string Test1(string fileName, string newValue)
{
  char newChar = string.IsNullOrEmpty(newValue) ? char.MinValue : newValue[0];

  char[] chars = fileName.ToCharArray();
  for (int i = 0; i < chars.Length; i++)
  {
    if (InvalidCharsHash.Contains(chars[i]))
      chars[i] = newChar;
  }

  return new string(chars);
}

private static string Test2(string fileName, string newValue)
{
  char newChar = string.IsNullOrEmpty(newValue) ? char.MinValue : newValue[0];
  bool remove = newChar == char.MinValue;

  char[] chars = fileName.ToCharArray();
  char[] newChars = new char[chars.Length];
  int i2 = 0;
  for (int i = 0; i < chars.Length; i++)
  {
    char c = chars[i];
    if (InvalidCharsHash.Contains(c))
    {
      if (!remove)
        newChars[i2++] = newChar;
    }
    else
      newChars[i2++] = c;

  }

  return new string(newChars, 0, i2);
}

private static string Test3(string filename, string newValue)
{
  foreach (char c in InvalidCharsValue)
  {
    filename = filename.Replace(c.ToString(), newValue);
  }

  return filename;
}

private static string Test4(string filename, string newValue)
{
  Regex r = new Regex(string.Format("[{0}]", Regex.Escape(InvalidCharsValue)));
  filename = r.Replace(filename, newValue);
  return filename;
}

private static string Test5(string filename, string newValue)
{
  return string.Join(newValue, filename.Split(InvalidChars));
}

private static string Test6(string fileName, string newValue)
{
  return InvalidChars.Aggregate(fileName, (current, c) => current.Replace(c.ToString(), newValue));
}

private static string Test7(string fileName, string newValue)
{
  string regex = string.Format("[{0}]", Regex.Escape(InvalidCharsValue));
  return Regex.Replace(fileName, regex, newValue, RegexOptions.Compiled);
}

private static string Test8(string fileName, string newValue)
{
  string regex = string.Format("[{0}]", Regex.Escape(InvalidCharsValue));
  Regex removeInvalidChars = new Regex(regex, RegexOptions.Singleline | RegexOptions.Compiled | RegexOptions.CultureInvariant);
  return removeInvalidChars.Replace(fileName, newValue);
}

private static string Test9(string fileName, string newValue)
{
  StringBuilder sb = new StringBuilder(fileName.Length);
  bool changed = false;

  for (int i = 0; i < fileName.Length; i++)
  {
    char c = fileName[i];
    if (InvalidCharsHash.Contains(c))
    {
      changed = true;
      sb.Append(newValue);
    }
    else
      sb.Append(c);
  }

  if (sb.Length == 0)
    return newValue;

  return changed ? sb.ToString() : fileName;
}

private static string Test10(string fileName, string newValue)
{
  if (!fileName.Any(c => InvalidChars.Contains(c)))
  {
    return fileName;
  }

  return new string(fileName.Where(c => !InvalidChars.Contains(c)).ToArray());
}

private static string Test11(string fileName, string newValue)
{
  string invalidCharsRemoved = new string(fileName
    .Where(x => !InvalidChars.Contains(x))
    .ToArray());

  return invalidCharsRemoved;
}
Coleridge answered 29/9, 2020 at 14:6 Comment(0)
P
2
public static class StringExtensions
      {
        public static string RemoveUnnecessary(this string source)
        {
            string result = string.Empty;
            string regex = new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars());
            Regex reg = new Regex(string.Format("[{0}]", Regex.Escape(regex)));
            result = reg.Replace(source, "");
            return result;
        }
    }

You can use method clearly.

Philodendron answered 22/2, 2018 at 11:25 Comment(0)
A
2

One liner to cleanup string from any illegal chars for windows file naming:

public static string CleanIllegalName(string p_testName) => new Regex(string.Format("[{0}]", Regex.Escape(new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars())))).Replace(p_testName, "");
Androsterone answered 2/12, 2018 at 1:49 Comment(0)
N
2

I've rolled my own method, which seems to be a lot faster of other posted here (especially the regex which is so sloooooow) but I didn't tested all methods posted.

https://dotnetfiddle.net/haIXiY

The first method (mine) and second (also mine, but old one) also do an added check on backslashes, so the benchmark are not perfect, but anyways it's just to give you an idea.

Result on my laptop (for 100 000 iterations):

StringHelper.RemoveInvalidCharacters 1: 451 ms  
StringHelper.RemoveInvalidCharacters 2: 7139 ms  
StringHelper.RemoveInvalidCharacters 3: 2447 ms  
StringHelper.RemoveInvalidCharacters 4: 3733 ms  
StringHelper.RemoveInvalidCharacters 5: 11689 ms  (==> Regex!)

The fastest method:

public static string RemoveInvalidCharacters(string content, char replace = '_', bool doNotReplaceBackslashes = false)
{
    if (string.IsNullOrEmpty(content))
        return content;

    var idx = content.IndexOfAny(InvalidCharacters);
    if (idx >= 0)
    {
        var sb = new StringBuilder(content);
        while (idx >= 0)
        {
            if (sb[idx] != '\\' || !doNotReplaceBackslashes)
                sb[idx] = replace;
            idx = content.IndexOfAny(InvalidCharacters, idx+1);
        }
        return sb.ToString();
    }
    return content;
}

Method doesn't compile "as is" dur to InvalidCharacters property, check the fiddle for full code

Newmint answered 23/10, 2020 at 7:29 Comment(0)
S
1
public static bool IsValidFilename(string testName)
{
    return !new Regex("[" + Regex.Escape(new String(System.IO.Path.GetInvalidFileNameChars())) + "]").IsMatch(testName);
}
Styria answered 18/11, 2013 at 13:28 Comment(0)
T
0

This will do want you want, and avoid collisions

 static string SanitiseFilename(string key)
    {
        var invalidChars = Path.GetInvalidFileNameChars();
        var sb = new StringBuilder();
        foreach (var c in key)
        {
            var invalidCharIndex = -1;
            for (var i = 0; i < invalidChars.Length; i++)
            {
                if (c == invalidChars[i])
                {
                    invalidCharIndex = i;
                }
            }
            if (invalidCharIndex > -1)
            {
                sb.Append("_").Append(invalidCharIndex);
                continue;
            }

            if (c == '_')
            {
                sb.Append("__");
                continue;
            }

            sb.Append(c);
        }
        return sb.ToString();

    }
Tutu answered 19/9, 2014 at 15:4 Comment(0)
S
0

I think the question already not full answered... The answers only describe clean filename OR path... not both. Here is my solution:

private static string CleanPath(string path)
{
    string regexSearch = new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars());
    Regex r = new Regex(string.Format("[{0}]", Regex.Escape(regexSearch)));
    List<string> split = path.Split('\\').ToList();
    string returnValue = split.Aggregate(string.Empty, (current, s) => current + (r.Replace(s, "") + @"\"));
    returnValue = returnValue.TrimEnd('\\');
    return returnValue;
}
Shantung answered 7/7, 2015 at 9:37 Comment(0)
S
0

I created an extension method that combines several suggestions:

  1. Holding illegal characters in a hash set
  2. Filtering out characters below ascii 127. Since Path.GetInvalidFileNameChars does not include all invalid characters possible with ascii codes from 0 to 255. See here and MSDN
  3. Possiblity to define the replacement character

Source:

public static class FileNameCorrector
{
    private static HashSet<char> invalid = new HashSet<char>(Path.GetInvalidFileNameChars());

    public static string ToValidFileName(this string name, char replacement = '\0')
    {
        var builder = new StringBuilder();
        foreach (var cur in name)
        {
            if (cur > 31 && cur < 128 && !invalid.Contains(cur))
            {
                builder.Append(cur);
            }
            else if (replacement != '\0')
            {
                builder.Append(replacement);
            }
        }

        return builder.ToString();
    }
}
Septuple answered 14/6, 2018 at 7:11 Comment(0)
H
0

Here is a function which replaces all illegal characters in a file name by a replacement character:

public static string ReplaceIllegalFileChars(string FileNameWithoutPath, char ReplacementChar)
{
  const string IllegalFileChars = "*?/\\:<>|\"";
  StringBuilder sb = new StringBuilder(FileNameWithoutPath.Length);
  char c;

  for (int i = 0; i < FileNameWithoutPath.Length; i++)
  {
    c = FileNameWithoutPath[i];
    if (IllegalFileChars.IndexOf(c) >= 0)
    {
      c = ReplacementChar;
    }
    sb.Append(c);
  }
  return (sb.ToString());
}

For example the underscore can be used as a replacement character:

NewFileName = ReplaceIllegalFileChars(FileName, '_');
Hydatid answered 14/5, 2020 at 12:58 Comment(1)
In addition to the answer you've provided, please consider providing a brief explanation of why and how this fixes the issue.Pipeline
L
-7

Or you can just do

[YOUR STRING].Replace('\\', ' ').Replace('/', ' ').Replace('"', ' ').Replace('*', ' ').Replace(':', ' ').Replace('?', ' ').Replace('<', ' ').Replace('>', ' ').Replace('|', ' ').Trim();
Lecithinase answered 15/1, 2014 at 21:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.