Where can I find a list of allowed characters in filenames, depending on the operating system?
(e.g., on Linux, the character :
is allowed in filenames, but not on Windows)
You should start with the Wikipedia Filename page. It has a decent-sized table (Comparison of filename limitations), listing the reserved characters for quite a lot of file systems.
It also has a plethora of other information about each file system, including reserved file names such as CON
under MS-DOS. I mention that only because I was bitten by that once when I shortened an include file from const.h
to con.h
and spent half an hour figuring out why the compiler hung.
Turns out DOS ignored extensions for devices so that con.h
was exactly the same as con
, the input console (meaning, of course, the compiler was waiting for me to type in the header file before it would continue).
POSIX "Fully portable filenames"
entry, which lists these: A–Z a–z 0–9 . _ -
–
Artina i
for example) for example you cannot name a file COM1 or you can crash windows by naming a file with reserved GUID in specific places. –
Bannerman OK, so looking at Comparison of file systems if you only care about the main players file systems:
- Windows (FAT32, NTFS): Any Unicode except
NUL
,\
,/
,:
,*
,?
,"
,<
,>
,|
. Also, no space character at the start or end, and no period at the end. - Mac(HFS, HFS+): Any valid Unicode except
:
or/
- Linux(ext[2-4]): Any byte except
NUL
or/
so any byte except NUL
, \
, /
, :
, *
, ?
, "
, <
, >
, |
and you can't have files/folders call .
or ..
and no control characters (of course).
/
. Windows doesn't allow backslash and some strings (e.g. CON
). –
Flouncing :
s in their names. –
Obligatory NUL
, /
" –
Lederhosen On Windows OS create a file and give it a invalid character like \
in the filename. As a result you will get a popup with all the invalid characters in a filename.
To be more precise about Mac OS X (now called MacOS) /
in the Finder is interpreted to :
in the Unix file system.
This was done for backward compatibility when Apple moved from Classic Mac OS.
It is legitimate to use a /
in a file name in the Finder, looking at the same file in the terminal it will show up with a :
.
And it works the other way around too: you can't use a /
in a file name with the terminal, but a :
is OK and will show up as a /
in the Finder.
Some applications may be more restrictive and prohibit both characters to avoid confusion or because they kept logic from previous Classic Mac OS or for name compatibility between platforms.
Rather than trying to identify all the characters that are unwanted, you could just look for anything except the acceptable characters. Here's a regex for anything except posix characters:
cleaned_name = re.sub(r'[^[:alnum:]._-]', '', name)
For "English locale" file names, this works nicely. I'm using this for sanitizing uploaded file names. The file name is not meant to be linked to anything on disk, it's for when the file is being downloaded hence there are no path checks.
$file_name = preg_replace('/([^\x20-~]+)|([\\/:?"<>|]+)/g', '_', $client_specified_file_name);
Basically it strips all non-printable and reserved characters for Windows and other OSs. You can easily extend the pattern to support other locales and functionalities.
I took a different approach. Instead of looking if the string contains only valid characters, I look for invalid/illegal characters instead.
NOTE: I needed to validate a path string, not a filename. But if you need to check a filename, simply add /
to the set.
def check_path_validity(path: str) -> bool:
# Check for invalid characters
for char in set('\?%*:|"<>'):
if char in path:
print(f"Illegal character {char} found in path")
return False
return True
Here is the code to clean file name in python.
import unicodedata
def clean_name(name, replace_space_with=None):
"""
Remove invalid file name chars from the specified name
:param name: the file name
:param replace_space_with: if not none replace space with this string
:return: a valid name for Win/Mac/Linux
"""
# ref: https://en.wikipedia.org/wiki/Filename
# ref: https://mcmap.net/q/116609/-allowed-characters-in-filename
# No control chars, no: /, \, ?, %, *, :, |, ", <, >
# remove control chars
name = ''.join(ch for ch in name if unicodedata.category(ch)[0] != 'C')
cleaned_name = re.sub(r'[/\\?%*:|"<>]', '', name)
if replace_space_with is not None:
return cleaned_name.replace(' ', replace_space_with)
return cleaned_name
:return: a valid name for Win/Mac/Linux
is not true in all circumstances. –
Ephemerid © 2022 - 2024 — McMap. All rights reserved.