Allowed characters in filename
Asked Answered
D

8

241

Where can I find a list of allowed characters in filenames, depending on the operating system? (e.g., on Linux, the character : is allowed in filenames, but not on Windows)

Drury answered 27/1, 2011 at 8:15 Comment(8)
.NET provides that info for Windows.Valerlan
#2680199Cyclopedia
@kreker note that your question is about AndroidIntonation
@Intonation en.wikipedia.org/wiki/Comparison_of_file_systemsCyclopedia
Possible duplicate of What characters are forbidden in Windows and Linux directory names?Surratt
Not sure how this could be considered a "recommendation for books, tools, software libraries, and more". It's clearly asking what the allowed characters are for a variety of filesystems, something that's quite handy if you're looking to use a common base. I see this as no different than asking what any specific limitation is. I suspect the recommendation reason for closure is more suited for actual requests for recommendations, such as "What's a good book for learning Python programming?".Destinydestitute
@Destinydestitute Just voted to reopen.Pantagruel
I have also voted to re-open, this is a valid question. It does not ask for recommendation, it is asking for the source of information.Bannerman
D
157

You should start with the Wikipedia Filename page. It has a decent-sized table (Comparison of filename limitations), listing the reserved characters for quite a lot of file systems.

It also has a plethora of other information about each file system, including reserved file names such as CON under MS-DOS. I mention that only because I was bitten by that once when I shortened an include file from const.h to con.h and spent half an hour figuring out why the compiler hung.

Turns out DOS ignored extensions for devices so that con.h was exactly the same as con, the input console (meaning, of course, the compiler was waiting for me to type in the header file before it would continue).

Destinydestitute answered 27/1, 2011 at 8:22 Comment(15)
I find the Wikipedia page somewhat vague and confusing, e.g. "Some operating systems prohibit some particular characters...". I'm actually looking for a complete table that lists all allowed and disallowed characters.Drury
@python, don't look at that table, look at the big honkin' one underneath it (entitled "Comparison of file name limitations"). That's not so vague in its content.Destinydestitute
Probably all you need is to look at the POSIX "Fully portable filenames" entry, which lists these: A–Z a–z 0–9 . _ -Artina
@VladimirKornea thanks! Links: pubs.opengroup.org/onlinepubs/9699919799/basedefs/… || pubs.opengroup.org/onlinepubs/9699919799/basedefs/…Tripos
I'd like to see the reasoning behind the POSIX "Fully portable filenames" thing. I've noticed that # works on OSX + Ubuntu and its not in the list of "reserved characters" for windows. Brackets also seem to work so what gives?Romilly
@Romilly There are more OSs than just Windows, OSX and Linux... some have very simple file systems.Ambert
@elegant dice yeah, but on the web thats all we care about and most app developers I imagine. The assumptions for the different OSs would be handy so one can build the list for just the OS you care about.Romilly
@Romilly you ask for the reasoning behind POSIX "Fully portable filenames", and that is the reason. The world is bigger than just the web... Also note that the web is also made up of IoT devices which can have some very limited and specific OSs. Someone has to write embedded and mainframe software. I agree that a list for the diff OSs would be useful tho.Ambert
@Romilly Notice that # must be urlencoded in a web context or it will break URLs, since # in the URL indicates the start of the hash fragment. None of the POSIX "Fully portable filenames" need to be urlencoded. Even if all the OSs you care about allow # characters in the filename, you might have been better off allowing only "portable" characters, for some such other reason that you haven't considered yet.Artina
@elegant-dice i wanted to know the details of the reasoning so i can ignore the OS's that are not in the big 3 or are defunct etc.Romilly
Your are right @Vladimir-Kornea i forgot about the special meaning of # in URLsRomilly
@VladimirKornea the question states "depending on the operating system" and not URLs. You should always pass your filenames thought a url encoder/decoder in any case.Romilly
Even though table from Wiki is corect for file system, it is missing reserved names for OS which is only listed under notes (for windows, os/2 in notes i for example) for example you cannot name a file COM1 or you can crash windows by naming a file with reserved GUID in specific places.Bannerman
@AaA: hence my comment about the plethora of other information on that page, specifically calling out things such as reserved names.Destinydestitute
@paxdiablo, No argument about your answer which is correct, just adding extra information, which bite me just few days ago and figured it out an hour ago. something like {ED7BA470-8E54-465E-825C-99712043E01C} as file extension in windowsBannerman
R
109

OK, so looking at Comparison of file systems if you only care about the main players file systems:

so any byte except NUL, \, /, :, *, ?, ", <, >, | and you can't have files/folders call . or .. and no control characters (of course).

Romilly answered 12/2, 2016 at 0:19 Comment(9)
This is not correct. Linux doesn't allow /. Windows doesn't allow backslash and some strings (e.g. CON).Flouncing
yeah, hence i said except.Romilly
On Mac (running HFS+), I am able to create files with :s in their names.Obligatory
This is not correct. See this answer for more characters that Windows does not allow.Denaedenarius
Windows does not allow any controls chars, either (but the Mac does, other than NUL)Peptonize
The Mac does allow "/" in a file name when using the classic (Carbon) APIs, and ":" when using the POSIX APIs (and it swaps them, so if you enter a name with "/" in the Finder, which is legal, it'll show up as a ":" when checking the name in Terminal, for instance)Peptonize
This is wrong for ext[2-4]. Per link you provided, it says "Any byte except NUL, /"Lederhosen
using %$# in paths will cause issues in bash scripts (cd $mydir) using % in paths will cause issues in windows scripts (cd %1)Aunt
@Systemsplanet's comment can be interpreted in two ways, so to clarify: if you remember to quote/escape those characters, they will not cause issues for your scripts.Towns
R
31

On Windows OS create a file and give it a invalid character like \ in the filename. As a result you will get a popup with all the invalid characters in a filename.

enter image description here

Remember answered 1/9, 2016 at 13:59 Comment(0)
O
7

To be more precise about Mac OS X (now called MacOS) / in the Finder is interpreted to : in the Unix file system.

This was done for backward compatibility when Apple moved from Classic Mac OS.

It is legitimate to use a / in a file name in the Finder, looking at the same file in the terminal it will show up with a :.

And it works the other way around too: you can't use a / in a file name with the terminal, but a : is OK and will show up as a / in the Finder.

Some applications may be more restrictive and prohibit both characters to avoid confusion or because they kept logic from previous Classic Mac OS or for name compatibility between platforms.

Oppilate answered 3/2, 2018 at 12:46 Comment(0)
K
2

Rather than trying to identify all the characters that are unwanted, you could just look for anything except the acceptable characters. Here's a regex for anything except posix characters:

cleaned_name = re.sub(r'[^[:alnum:]._-]', '', name)

Khelat answered 15/5, 2022 at 21:54 Comment(1)
Or "re.sub(r'[^a-zA-Z._-]', name)".County
O
0

For "English locale" file names, this works nicely. I'm using this for sanitizing uploaded file names. The file name is not meant to be linked to anything on disk, it's for when the file is being downloaded hence there are no path checks.

$file_name = preg_replace('/([^\x20-~]+)|([\\/:?"<>|]+)/g', '_', $client_specified_file_name);

Basically it strips all non-printable and reserved characters for Windows and other OSs. You can easily extend the pattern to support other locales and functionalities.

Oenomel answered 24/10, 2018 at 3:35 Comment(0)
C
0

I took a different approach. Instead of looking if the string contains only valid characters, I look for invalid/illegal characters instead.

NOTE: I needed to validate a path string, not a filename. But if you need to check a filename, simply add / to the set.

def check_path_validity(path: str) -> bool:
    # Check for invalid characters
    for char in set('\?%*:|"<>'):
        if char in path:
            print(f"Illegal character {char} found in path")
            return False
    return True
Compulsive answered 9/9, 2022 at 7:56 Comment(0)
B
-1

Here is the code to clean file name in python.

import unicodedata

def clean_name(name, replace_space_with=None):
    """
    Remove invalid file name chars from the specified name

    :param name: the file name
    :param replace_space_with: if not none replace space with this string
    :return: a valid name for Win/Mac/Linux
    """

    # ref: https://en.wikipedia.org/wiki/Filename
    # ref: https://mcmap.net/q/116609/-allowed-characters-in-filename
    # No control chars, no: /, \, ?, %, *, :, |, ", <, >

    # remove control chars
    name = ''.join(ch for ch in name if unicodedata.category(ch)[0] != 'C')

    cleaned_name = re.sub(r'[/\\?%*:|"<>]', '', name)
    if replace_space_with is not None:
        return cleaned_name.replace(' ', replace_space_with)
    return cleaned_name
Blocker answered 8/6, 2018 at 15:20 Comment(1)
The code does not check for invalid (reserved) names, and does not check for an invalid character in replace_space_with, too. Length of file name is beyond of scope. So, :return: a valid name for Win/Mac/Linux is not true in all circumstances.Ephemerid

© 2022 - 2024 — McMap. All rights reserved.