How to make a valid Windows filename from an arbitrary string?
Asked Answered
R

17

115

I've got a string like "Foo: Bar" that I want to use as a filename, but on Windows the ":" char isn't allowed in a filename.

Is there a method that will turn "Foo: Bar" into something like "Foo- Bar"?

Ranchero answered 6/3, 2009 at 22:5 Comment(1)
I did this same thing today. I didn't check SO for some reason, but found the answer anyway.Crib
C
180

Try something like this:

string fileName = "something";
foreach (char c in System.IO.Path.GetInvalidFileNameChars())
{
   fileName = fileName.Replace(c, '_');
}

Edit:

Since GetInvalidFileNameChars() will return 10 or 15 chars, it's better to use a StringBuilder instead of a simple string; the original version will take longer and consume more memory.

Carine answered 6/3, 2009 at 22:9 Comment(11)
Good call on S.I.P.GIFNC. The loop is roughly what I ended up doing, but I'm not crazy about calling string.Replace in a loop -- I was hoping there would be a builtin that was both simple and efficient.Ranchero
You could use a StringBuilder if you wish, but if the names are short and i guess it's not worth it. You could also create your own method to create a char[] and replace all wrong chars in one iteration. Always is better to keep it simple unless it doesn't work, you might have worse bottle necksCarine
I don't know c#, but is it not possible to use a remove() method that takes a set of characters? This set of characters appears to be handily provided by GetInvalidFileNameChars(). Also, realistically, how many times will that loop iterate? 6 usually, 40 at most if the fnuction also returns non printed ascii, maybe? caveat: the msdn for that function also mentions that you should use GetInvalidPathChars, as GIFNC doesn't return a '\' or '/', which are invalid filename chars.Psephology
I don't know any "Remove" method, similar to the one you are talking about; even if it exist how it would be able to resolve faster? The only thing it could do is to copy the result of GIFNC to an array to avoid the call overhead (if any). Regarding the other comment, you should use GIFNC because this one is which returns the \ and /. Use Reflector to check the Path's static constructor if you wish. Here's the declarition in Windows (in Mono Linux might be different):Carine
InvalidFileNameChars = new char[] { '"', '<', '>', '|', '\0', '\x0001', '\x0002', '\x0003', '\x0004', '\x0005', '\x0006', '\a', '\b', '\t', '\n', '\v', '\f', '\r', '\x000e', '\x000f', '\x0010', '\x0011', '\x0012', '\x0013', '\x0014', '\x0015', '\x0016', '\x0017', '\x0018', '\x0019', '\x001a', '\x001b', '\x001c', '\x001d', '\x001e', '\x001f', ':', '*', '?', '\\', '/' };Carine
The probability to have 2+ different invalid chars in the string is so small that caring about performance of string.Replace() is pointless.Selfevident
There's an additional cost to create a StringBuilder object which is more expensive than declaring a value-type string. I doubt its worth using a StringBuilder in this particular scenario as the string size and loop count are so tiny.Bain
@Bain that's a good point. Others have mentioned this as well. It all depends on the average invalid characters you expect to have. If it will be near zero, then use a string. If you expect to always have one or more, then I would go with a StringBuilder.Carine
Great solution, interesting aside, resharper suggested this Linq version: fileName = System.IO.Path.GetInvalidFileNameChars().Aggregate(fileName, (current, c) => current.Replace(c, '_')); I wonder if there are any possible performance improvements there. I have kept the original for readability purposes as performance is not my biggest concern. But if anyone is interested, might be worth benchmarkingAffront
This doesn't change . (period/fullstop) characters. You may want to ensure there is only one of these for the final file extension.Doro
@Doro No need to. file.name.txt.pdf is a valid pdf. Windows reads only the last . for the extension.Carine
K
37
fileName = fileName.Replace(":", "-") 

However ":" is not the only illegal character for Windows. You will also have to handle:

/, \, :, *, ?, ", <, > and |

These are contained in System.IO.Path.GetInvalidFileNameChars();

Also (on Windows), "." cannot be the only character in the filename (both ".", "..", "...", and so on are invalid). Be careful when naming files with ".", for example:

echo "test" > .test.

Will generate a file named ".test"

Lastly, if you really want to do things correctly, there are some special file names you need to look out for. On Windows you can't create files named:

CON, PRN, AUX, CLOCK$, NUL
COM0, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9
LPT0, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, and LPT9.
Katharynkathe answered 6/3, 2009 at 22:14 Comment(5)
I never knew about the reserved names. Makes sense thoughNorthey
Also, for what it's worth, you can not create a filename starting with one of these reserved names, followed by a decimal. i.e. con.air.aviIncome
".foo" is a valid filename. Didn't know about the "CON" filename - what is it for?Acetamide
Scratch that. CON is for console.Acetamide
Thanks configurator; I've updated the answer, you are correct ".foo" is valid; however ".foo." leads to possible, unwanted results. Updated.Katharynkathe
H
19

This isn't more efficient, but it's more fun :)

var fileName = "foo:bar";
var invalidChars = System.IO.Path.GetInvalidFileNameChars();
var cleanFileName = new string(fileName.Where(m => !invalidChars.Contains(m)).ToArray<char>());
Hachmann answered 10/11, 2011 at 16:16 Comment(0)
W
16

In case anyone wants an optimized version based on StringBuilder, use this. Includes rkagerer's trick as an option.

static char[] _invalids;

/// <summary>Replaces characters in <c>text</c> that are not allowed in 
/// file names with the specified replacement character.</summary>
/// <param name="text">Text to make into a valid filename. The same string is returned if it is valid already.</param>
/// <param name="replacement">Replacement character, or null to simply remove bad characters.</param>
/// <param name="fancy">Whether to replace quotes and slashes with the non-ASCII characters ” and ⁄.</param>
/// <returns>A string that can be used as a filename. If the output string would otherwise be empty, returns "_".</returns>
public static string MakeValidFileName(string text, char? replacement = '_', bool fancy = true)
{
    StringBuilder sb = new StringBuilder(text.Length);
    var invalids = _invalids ?? (_invalids = Path.GetInvalidFileNameChars());
    bool changed = false;
    for (int i = 0; i < text.Length; i++) {
        char c = text[i];
        if (invalids.Contains(c)) {
            changed = true;
            var repl = replacement ?? '\0';
            if (fancy) {
                if (c == '"')       repl = '”'; // U+201D right double quotation mark
                else if (c == '\'') repl = '’'; // U+2019 right single quotation mark
                else if (c == '/')  repl = '⁄'; // U+2044 fraction slash
            }
            if (repl != '\0')
                sb.Append(repl);
        } else
            sb.Append(c);
    }
    if (sb.Length == 0)
        return "_";
    return changed ? sb.ToString() : text;
}
Weakness answered 9/8, 2014 at 22:56 Comment(2)
+1 for nice and readable code. Makes very easy to read & notice the bugs :P.. This function should return always original string as changed will never be true.Morava
Thanks, I think it's better now. You know what they say about open source, "many eyes make all bugs shallow so I don't have to write unit tests"...Weakness
F
11

Here's a version of the accepted answer using Linq which uses Enumerable.Aggregate:

string fileName = "something";

Path.GetInvalidFileNameChars()
    .Aggregate(fileName, (current, c) => current.Replace(c, '_'));
Flowing answered 10/12, 2015 at 17:48 Comment(0)
M
10

A simple one line code:

var validFileName = Path.GetInvalidFileNameChars().Aggregate(fileName, (f, c) => f.Replace(c, '_'));

You can wrap it in an extension method if you want to reuse it.

public static string ToValidFileName(this string fileName) => Path.GetInvalidFileNameChars().Aggregate(fileName, (f, c) => f.Replace(c, '_'));
Motta answered 12/3, 2020 at 6:19 Comment(0)
Q
9

Here's a slight twist on Diego's answer.

If you're not afraid of Unicode, you can retain a bit more fidelity by replacing the invalid characters with valid Unicode symbols that resemble them. Here's the code I used in a recent project involving lumber cutlists:

static string MakeValidFilename(string text) {
  text = text.Replace('\'', '’'); // U+2019 right single quotation mark
  text = text.Replace('"',  '”'); // U+201D right double quotation mark
  text = text.Replace('/', '⁄');  // U+2044 fraction slash
  foreach (char c in System.IO.Path.GetInvalidFileNameChars()) {
    text = text.Replace(c, '_');
  }
  return text;
}

This produces filenames like 1⁄2” spruce.txt instead of 1_2_ spruce.txt

Yes, it really works:

Explorer sample

Caveat Emptor

I knew this trick would work on NTFS but was surprised to find it also works on FAT and FAT32 partitions. That's because long filenames are stored in Unicode, even as far back as Windows 95/NT. I tested on Win7, XP, and even a Linux-based router and they showed up OK. Can't say the same for inside a DOSBox.

That said, before you go nuts with this, consider whether you really need the extra fidelity. The Unicode look-alikes could confuse people or old programs, e.g. older OS's relying on codepages.

Quaternary answered 1/8, 2014 at 8:9 Comment(0)
L
8

Diego does have the correct solution but there is one very small mistake in there. The version of string.Replace being used should be string.Replace(char, char), there isn't a string.Replace(char, string)

I can't edit the answer or I would have just made the minor change.

So it should be:

string fileName = "something";
foreach (char c in System.IO.Path.GetInvalidFileNameChars())
{
   fileName = fileName.Replace(c, '_');
}
Limburg answered 6/3, 2009 at 22:5 Comment(0)
F
6

Another simple solution:

private string MakeValidFileName(string original, char replacementChar = '_')
{
  var invalidChars = new HashSet<char>(Path.GetInvalidFileNameChars());
  return new string(original.Select(c => invalidChars.Contains(c) ? replacementChar : c).ToArray());
}
Fanion answered 5/4, 2017 at 19:11 Comment(0)
F
5

Here's a version that uses StringBuilder and IndexOfAny with bulk append for full efficiency. It also returns the original string rather than create a duplicate string.

Last but not least, it has a switch statement that returns look-alike characters which you can customize any way you wish. Check out Unicode.org's confusables lookup to see what options you might have, depending on the font.

public static string GetSafeFilename(string arbitraryString)
{
    var invalidChars = System.IO.Path.GetInvalidFileNameChars();
    var replaceIndex = arbitraryString.IndexOfAny(invalidChars, 0);
    if (replaceIndex == -1) return arbitraryString;

    var r = new StringBuilder();
    var i = 0;

    do
    {
        r.Append(arbitraryString, i, replaceIndex - i);

        switch (arbitraryString[replaceIndex])
        {
            case '"':
                r.Append("''");
                break;
            case '<':
                r.Append('\u02c2'); // '˂' (modifier letter left arrowhead)
                break;
            case '>':
                r.Append('\u02c3'); // '˃' (modifier letter right arrowhead)
                break;
            case '|':
                r.Append('\u2223'); // '∣' (divides)
                break;
            case ':':
                r.Append('-');
                break;
            case '*':
                r.Append('\u2217'); // '∗' (asterisk operator)
                break;
            case '\\':
            case '/':
                r.Append('\u2044'); // '⁄' (fraction slash)
                break;
            case '\0':
            case '\f':
            case '?':
                break;
            case '\t':
            case '\n':
            case '\r':
            case '\v':
                r.Append(' ');
                break;
            default:
                r.Append('_');
                break;
        }

        i = replaceIndex + 1;
        replaceIndex = arbitraryString.IndexOfAny(invalidChars, i);
    } while (replaceIndex != -1);

    r.Append(arbitraryString, i, arbitraryString.Length - i);

    return r.ToString();
}

It doesn't check for ., .., or reserved names like CON because it isn't clear what the replacement should be.

Fulltime answered 8/5, 2015 at 14:20 Comment(0)
A
3

Cleaning a little my code and making a little refactoring... I created an extension for string type:

public static string ToValidFileName(this string s, char replaceChar = '_', char[] includeChars = null)
{
  var invalid = Path.GetInvalidFileNameChars();
  if (includeChars != null) invalid = invalid.Union(includeChars).ToArray();
  return string.Join(string.Empty, s.ToCharArray().Select(o => o.In(invalid) ? replaceChar : o));
}

Now it's easier to use with:

var name = "Any string you want using ? / \ or even +.zip";
var validFileName = name.ToValidFileName();

If you want to replace with a different char than "_" you can use:

var validFileName = name.ToValidFileName(replaceChar:'#');

And you can add chars to replace.. for example you dont want spaces or commas:

var validFileName = name.ToValidFileName(includeChars: new [] { ' ', ',' });

Hope it helps...

Cheers

Arabella answered 1/8, 2013 at 10:47 Comment(0)
S
1

I needed a system that couldn't create collisions so I couldn't map multiple characters to one. I ended up with:

public static class Extension
{
    /// <summary>
    /// Characters allowed in a file name. Note that curly braces don't show up here
    /// becausee they are used for escaping invalid characters.
    /// </summary>
    private static readonly HashSet<char> CleanFileNameChars = new HashSet<char>
    {
        ' ', '!', '#', '$', '%', '&', '\'', '(', ')', '+', ',', '-', '.',
        '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '=', '@',
        'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',
        'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
        '[', ']', '^', '_', '`',
        'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
        'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z',
    };

    /// <summary>
    /// Creates a clean file name from one that may contain invalid characters in 
    /// a way that will not collide.
    /// </summary>
    /// <param name="dirtyFileName">
    /// The file name that may contain invalid filename characters.
    /// </param>
    /// <returns>
    /// A file name that does not contain invalid filename characters.
    /// </returns>
    /// <remarks>
    /// <para>
    /// Escapes invalid characters by converting their ASCII values to hexadecimal
    /// and wrapping that value in curly braces. Curly braces are escaped by doubling
    /// them, for example '{' => "{{".
    /// </para>
    /// <para>
    /// Note that although NTFS allows unicode characters in file names, this
    /// method does not.
    /// </para>
    /// </remarks>
    public static string CleanFileName(this string dirtyFileName)
    {
        string EscapeHexString(char c) =>
            "{" + (c > 255 ? $"{(uint)c:X4}" : $"{(uint)c:X2}") + "}";

        return string.Join(string.Empty,
                           dirtyFileName.Select(
                               c =>
                                   c == '{' ? "{{" :
                                   c == '}' ? "}}" :
                                   CleanFileNameChars.Contains(c) ? $"{c}" :
                                   EscapeHexString(c)));
    }
}
Summons answered 28/8, 2019 at 19:57 Comment(0)
A
0

I needed to do this today... in my case, I needed to concatenate a customer name with the date and time for a final .kmz file. My final solution was this:

 string name = "Whatever name with valid/invalid chars";
 char[] invalid = System.IO.Path.GetInvalidFileNameChars();
 string validFileName = string.Join(string.Empty,
                            string.Format("{0}.{1:G}.kmz", name, DateTime.Now)
                            .ToCharArray().Select(o => o.In(invalid) ? '_' : o));

You can even make it replace spaces if you add the space char to the invalid array.

Maybe it's not the fastest, but as performance wasn't an issue, I found it elegant and understandable.

Cheers!

Arabella answered 1/8, 2013 at 10:25 Comment(0)
E
0

There are no valid answers in this topic yet. Author said: "...I want to use as a filename...". Remove/replace invalid characters is not enough to use something as filename. You should at least check that:

  1. You don't already have file with such name in a folder, where you want to create a new one
  2. Total path to file (path to folder + filename + extension) is not more than MAX_PATH (260 symbols). Yes, there are tricks to hack this on latest Windows, but if you want your app to work fine - you should check it
  3. You don't use any special filenames (see answer by @Phil Price)

Probably the best way would be to:

  1. Remove bad characters using one of the other answers here.
  2. Make sure total path is less than 260 characters (if not - remove last N chars)
  3. Make sure file with given filename doesn't exist (if it does - replace last N chars until find available filename)
  4. Make sure you don't use any reserved filenames (if you do - replace last N chars until find proper and available filename)

As always, things are more complicated, then they look. Better to use some already existing function, like GetTempFileNameW

Elmer answered 23/9, 2021 at 15:30 Comment(0)
E
0

Still another solution I am using for the last ~10 years, very similar to previous solutions, without the 'fancy' parts: The main method gets the specialcharacters as input, since I was using it also for other purposes, e.g. getting web compatible names, especially back then when renaming files for SharePoint/OneDrive

Not sure how much it improves the speed, but also chose to check the filename for any special characters BEFORE using the StringBuilder with IndexOfAny().

private static string SanitizeFilename(this string filename) 
   => filename.RemoveOrReplaceSpecialCharacters(Path.GetInvalidFileNameChars(), '_');

private static string RemoveOrReplaceSpecialCharacters(this string str, char[] specialCharacters, char? replaceChar)
{
    if (string.IsNullOrEmpty(str))
        return str;
    if (specialCharacters == null || specialCharacters.Length == 0)
        return str;

    if (str.IndexOfAny(specialCharacters) == 0)
        return str;

    var sb = new StringBuilder(str.Length);
    foreach (char c in str)
    {
        if (!specialCharacters.Contains(c))
            sb.Append(c);
        else if (replaceChar.HasValue)
            sb.Append(replaceChar.Value);
    }
    return sb.ToString();         
}

I tried also

return new string(str.Except(specialCharacters).ToArray());

but it created strange behavior, where duplicate are ignored and further issue. For instance, "Bla-ID" became "BlaI" when specifying - as single special char.

Era answered 19/5, 2022 at 15:41 Comment(0)
L
0

An efficient way to do this is

    string.Join("_", fileName.Split(System.IO.Path.GetInvalidFileNameChars(), StringSplitOptions.RemoveEmptyEntries))
Lenard answered 2/2, 2024 at 10:20 Comment(1)
Thank you for your interest in contributing to the Stack Overflow community. This question already has quite a few answers—including one that has been extensively validated by the community. Are you certain your approach hasn’t been given previously? If so, it would be useful to explain how your approach is different, under what circumstances your approach might be preferred, and/or why you think the previous answers aren’t sufficient. Can you kindly edit your answer to offer an explanation?Histoplasmosis
M
-2

You can do this with a sed command:

 sed -e "
 s/[?()\[\]=+<>:;©®”,*|]/_/g
 s/"$'\t'"/ /g
 s/–/-/g
 s/\"/_/g
 s/[[:cntrl:]]/_/g"
Menes answered 11/12, 2010 at 1:0 Comment(3)
also see a more complicated but related question at: #4413927Menes
Why does this need to be done in C# rather than Bash? I see now a tag of C# on the original question, but why?Menes
I know, right, why not just shell out from the C# application to Bash that might not be installed to accomplish this?Kerekes

© 2022 - 2025 — McMap. All rights reserved.