Extract file name from path, no matter what the os/path format
Asked Answered
H

23

1324

Which Python library can I use to extract filenames from paths, no matter what the operating system or path format could be?

For example, I'd like all of these paths to return me c:

a/b/c/
a/b/c
\a\b\c
\a\b\c\
a\b\c
a/b/../../a/b/c/
a/b/../../a/b/c
Hukill answered 5/12, 2011 at 11:39 Comment(3)
This question is crazy. There is no reliable way to parse a path with forward and backward slashes on all operating systems. On Unix you CAN have a backslash in a folder name. You can only implement something that will work "most of the time", aka bug. Better find a way to avoid such crazy paths. Use system libraries for parsing paths, but also for building paths to begin with. The best solution to this problem is to eliminate such ambiguous paths. Good Luck!Cyclo
As an example of what @Cyclo said: \a\b\c is a valid filename on Linux. Returning just c instead may be invalid and dangerous.L
@Cyclo I think the question covers the possibility that the OS/filesystem/separator is specified within the function call. So this makes the question not crazy at all.Excipient
U
1060

Using os.path.split or os.path.basename as others suggest won't work in all cases: if you're running the script on Linux and attempt to process a classic windows-style path, it will fail.

Windows paths can use either backslash or forward slash as path separator. Therefore, the ntpath module (which is equivalent to os.path when running on windows) will work for all(1) paths on all platforms.

import ntpath
ntpath.basename("a/b/c")

Of course, if the file ends with a slash, the basename will be empty, so make your own function to deal with it:

def path_leaf(path):
    head, tail = ntpath.split(path)
    return tail or ntpath.basename(head)

Verification:

>>> paths = ['a/b/c/', 'a/b/c', '\\a\\b\\c', '\\a\\b\\c\\', 'a\\b\\c', 
...     'a/b/../../a/b/c/', 'a/b/../../a/b/c']
>>> [path_leaf(path) for path in paths]
['c', 'c', 'c', 'c', 'c', 'c', 'c']


(1) There's one caveat: Linux filenames may contain backslashes. So on linux, r'a/b\c' always refers to the file b\c in the a folder, while on Windows, it always refers to the c file in the b subfolder of the a folder. So when both forward and backward slashes are used in a path, you need to know the associated platform to be able to interpret it correctly. In practice it's usually safe to assume it's a windows path since backslashes are seldom used in Linux filenames, but keep this in mind when you code so you don't create accidental security holes.

Usufruct answered 5/12, 2011 at 11:45 Comment(8)
on Windows, os.path just loads the ntpath module internally. Using this module, it is possible to handle the '\\' path separators even on Linux machines. For Linux the posixpath module (resp. os.path) will simplify the path operations to allow only posix style '/' separators.Exhort
@Exhort So we could use Stranac's answer, and it is reliable? ("Using os.path.split or os.path.basename as others suggest won't work in all cases: if you're running the script on Linux and attempt to process a classic windows-style path, it will fail" -- the quote is from Lauritz's post -- and I don't understand, does this warning concerns Stranac's answer, or not).Spokane
@johnc.j. Only when you need to parse Windows style paths (e.g., r'C:\path\to\file.txt') on a Linux machine, you need to use the ntpath module. Otherwise, you can use the functions from os.path. This is because Linux systems normally allow the use of the backslash characters in filenames (as explained in the answer).Exhort
Isn't your solution equivalent to os.path.basename(os.path.normpath(path)) ?Joycelynjoye
For what it's worth to future visitors to this question, I ran into the situation Lauritz was warning about and his solution was the only one that worked. No finangling with os could output just the filename. So imho, ntpath is the way to go.Breeder
Why would you use a Windows-style path on Linux? Aren't those invalid on Linux, meaning that you are probably doing something wrong if you're doing that?Kowtko
Failed case: import ntpath print(ntpath.basename("C:\a\b\s\d\f_a\aasdas_o_g.json")) Results: d_aasdas_o_g.jsonCaitlin
@Kowtko I have to process Windows-style paths on Linux when processing data from Windows systems (like event logs) on a Linux system. This is probably much more common than you think, especially in the security world.Bonucci
O
1892

There's a function that returns exactly what you want

import os
print(os.path.basename(your_path))

WARNING: When os.path.basename() is used on a POSIX system to get the base name from a Windows-styled path (e.g. "C:\\my\\file.txt"), the entire path will be returned.

Example below from interactive python shell running on a Linux host:

Python 3.8.2 (default, Mar 13 2020, 10:14:16)
[GCC 9.3.0] on Linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> filepath = "C:\\my\\path\\to\\file.txt" # A Windows style file path.
>>> os.path.basename(filepath)
'C:\\my\\path\\to\\file.txt'
Occupancy answered 5/12, 2011 at 11:49 Comment(9)
If you want to process paths in OS independent way, then for os.path.basename(u"C:\\temp\\bla.txt") you are expecting to get 'bla.txt' . The question is not about obtaining a valid filename, but extracting the name for a path.Rum
On my Google search for finding the filename of a path, this answer was the most helpful. My use case is only on Windows anyway.Arianearianie
os.path.basename(your_path) This worked! I wanted script path: os.path.dirname(os.path.realpath(__file__)) and script name: os.path.basename(os.path.realpath(__file__)). Thanks!Gratulate
@AdiRoiban Could you please elaborate your comment? I tested it on Windows 7 and I actually get "bla.txt'. Simply saying, I don't see any problem (for myself).Spokane
@johnc.j. The point is, when you attempted this on Linux, you'd get 'C:\\temp\\bla.txt' instead.Exhort
Note, this doesn't work on all the cases listed by OP. If the your_path ends in a /, then os.path.basename(your_path) returns ''. To get around this, you need to remove any trailing path separators first.Lawrenson
@Occupancy You're right, that's awfully egocentric of the Linux implementation, to not consider backslashes in the path as proper pathing separators. On the bright side, Windows-style paths do work on Linux, but you have to use forward slashes only (so you could do filepath.replace('\\', '/') to get some plat-independence here)Osi
@Osi The Linux implementation is the only correct way to implement this functionality. The issue here is that POSIX paths and Windows paths are incompatible and can't be processed without knowing the OS to which they belong. It's impossible to disambiguate the path a\b.txt which is a valid single component path on Linux but also a valid double component path on windows. What you're criticizing here is impossible to solve with paths alone.Improvisator
@AdiRoiban "process paths in OS independent way" - this is just not possible. On Linux I can create a file with filename r'c:\temp\bla.txt', so returning bla.txt on Linux would be just invalid.L
U
1060

Using os.path.split or os.path.basename as others suggest won't work in all cases: if you're running the script on Linux and attempt to process a classic windows-style path, it will fail.

Windows paths can use either backslash or forward slash as path separator. Therefore, the ntpath module (which is equivalent to os.path when running on windows) will work for all(1) paths on all platforms.

import ntpath
ntpath.basename("a/b/c")

Of course, if the file ends with a slash, the basename will be empty, so make your own function to deal with it:

def path_leaf(path):
    head, tail = ntpath.split(path)
    return tail or ntpath.basename(head)

Verification:

>>> paths = ['a/b/c/', 'a/b/c', '\\a\\b\\c', '\\a\\b\\c\\', 'a\\b\\c', 
...     'a/b/../../a/b/c/', 'a/b/../../a/b/c']
>>> [path_leaf(path) for path in paths]
['c', 'c', 'c', 'c', 'c', 'c', 'c']


(1) There's one caveat: Linux filenames may contain backslashes. So on linux, r'a/b\c' always refers to the file b\c in the a folder, while on Windows, it always refers to the c file in the b subfolder of the a folder. So when both forward and backward slashes are used in a path, you need to know the associated platform to be able to interpret it correctly. In practice it's usually safe to assume it's a windows path since backslashes are seldom used in Linux filenames, but keep this in mind when you code so you don't create accidental security holes.

Usufruct answered 5/12, 2011 at 11:45 Comment(8)
on Windows, os.path just loads the ntpath module internally. Using this module, it is possible to handle the '\\' path separators even on Linux machines. For Linux the posixpath module (resp. os.path) will simplify the path operations to allow only posix style '/' separators.Exhort
@Exhort So we could use Stranac's answer, and it is reliable? ("Using os.path.split or os.path.basename as others suggest won't work in all cases: if you're running the script on Linux and attempt to process a classic windows-style path, it will fail" -- the quote is from Lauritz's post -- and I don't understand, does this warning concerns Stranac's answer, or not).Spokane
@johnc.j. Only when you need to parse Windows style paths (e.g., r'C:\path\to\file.txt') on a Linux machine, you need to use the ntpath module. Otherwise, you can use the functions from os.path. This is because Linux systems normally allow the use of the backslash characters in filenames (as explained in the answer).Exhort
Isn't your solution equivalent to os.path.basename(os.path.normpath(path)) ?Joycelynjoye
For what it's worth to future visitors to this question, I ran into the situation Lauritz was warning about and his solution was the only one that worked. No finangling with os could output just the filename. So imho, ntpath is the way to go.Breeder
Why would you use a Windows-style path on Linux? Aren't those invalid on Linux, meaning that you are probably doing something wrong if you're doing that?Kowtko
Failed case: import ntpath print(ntpath.basename("C:\a\b\s\d\f_a\aasdas_o_g.json")) Results: d_aasdas_o_g.jsonCaitlin
@Kowtko I have to process Windows-style paths on Linux when processing data from Windows systems (like event logs) on a Linux system. This is probably much more common than you think, especially in the security world.Bonucci
A
390

os.path.split is the function you are looking for

head, tail = os.path.split("/tmp/d/a.dat")

>>> print(tail)
a.dat
>>> print(head)
/tmp/d
Arachnoid answered 5/12, 2011 at 11:45 Comment(3)
Just for other users to be careful, this returns "" if the paths ends in "/" or "\"Hukill
When I try "C:\Users\Dell\Desktop\ProjectShadow\button\button.py" it returns thi "ProjectShadowuttontton" for everything other than this it return correct resultHammon
@Hammon - Either do this: r"C:\Users\Dell\Desktop\ProjectShadow\button\button.py" or this: "C:\\Users\\Dell\\Desktop\\ProjectShadow\\button\\button.py" - "\b" is a special character (system 'bell' I think), similar to how \r or \n signify newline/carriage return. Prefixing the string with r"C:\..." means use the given raw inputPulchia
W
324

In python 3.4 or later, with pathlib.Path:

>>> from pathlib import Path    
>>> Path("/tmp/d/a.dat").name
'a.dat'

The .name property will give the full name of the final child element in the path, regardless of whether it is a file or a folder.

Wellington answered 3/2, 2018 at 4:6 Comment(3)
3.4 to 3.6 or later, depending exactly which pathlib items you use.Adriel
can also use Path("some/path/to/file.dat").stem to get the filename without the file extensionAlleris
...and pathlib.Path('some/path/to/file.dat').suffix yields the extension.Downstage
B
89
import os
head, tail = os.path.split('path/to/file.exe')

tail is what you want, the filename.

See python os module docs for detail

Brisance answered 5/12, 2011 at 11:45 Comment(1)
Just for other users to be careful, this returns "" if the paths ends in "/" or "\"Hukill
T
43
import os
file_location = '/srv/volume1/data/eds/eds_report.csv'
file_name = os.path.basename(file_location )  #eds_report.csv
location = os.path.dirname(file_location )    #/srv/volume1/data/eds
Trifle answered 3/9, 2019 at 10:39 Comment(1)
As sweet and concise as an answer can get! Thank you!Sharpnosed
P
20

My personal favourite is:

filename = fullname.split(os.sep)[-1]
Purser answered 17/12, 2020 at 17:31 Comment(0)
C
17

If you want to get the filename automatically you can do

import glob

for f in glob.glob('/your/path/*'):
    print(os.path.split(f)[-1])
Comprehension answered 25/10, 2019 at 10:52 Comment(0)
L
15

In your example you will also need to strip slash from right the right side to return c:

>>> import os
>>> path = 'a/b/c/'
>>> path = path.rstrip(os.sep) # strip the slash from the right side
>>> os.path.basename(path)
'c'

Second level:

>>> os.path.filename(os.path.dirname(path))
'b'

update: I think lazyr has provided the right answer. My code will not work with windows-like paths on unix systems and vice versus with unix-like paths on windows system.

Lanfranc answered 5/12, 2011 at 11:51 Comment(6)
Your answer won't work for r"a\b\c" on linux, nor for "a/b/c" on windows.Usufruct
of course, os.path.basename(path) will only work if os.path.isfile(path) is True. Therefore path = 'a/b/c/' is not a valid filename at all...Exhort
@fmaas os.path.basename is purely a string-processing function. It does not care if the file exists or whether it's a file or dir. os.path.basename("a/b/c/") returns "" because of the trailing slash.Usufruct
lazyr you are right! I didn't thought about that. Would it be safe to just do path = path.replace('\\', '/') ?Lanfranc
@Skirmantas I suppose, but it doesn't feel right. I think path processing should be done with the built-in tools that were made for the job. There's a lot more to paths than meets the eye.Usufruct
indeed, I think lazyr has the most powerful function ;)Hukill
W
15
fname = str("C:\Windows\paint.exe").split('\\')[-1:][0]

this will return : paint.exe

change the sep value of the split function regarding your path or OS.

Wassail answered 3/11, 2014 at 7:39 Comment(1)
This is the answer I liked, but why not just do the following? fname = str(path).split('/')[-1]Turgeon
W
12

File name with extension

filepath = './dir/subdir/filename.ext'
basename = os.path.basename(filepath)
print(basename)
# filename.ext

print(type(basename))
# <class 'str'>

File name without extension

basename_without_ext = os.path.splitext(os.path.basename(filepath))[0]
print(basename_without_ext)
# filename
Watershed answered 4/3, 2022 at 10:30 Comment(0)
U
11

This is working for linux and windows as well with standard library

paths = ['a/b/c/', 'a/b/c', '\\a\\b\\c', '\\a\\b\\c\\', 'a\\b\\c',
         'a/b/../../a/b/c/', 'a/b/../../a/b/c']

def path_leaf(path):
    return path.strip('/').strip('\\').split('/')[-1].split('\\')[-1]

[path_leaf(path) for path in paths]

Results:

['c', 'c', 'c', 'c', 'c', 'c', 'c']
Unfailing answered 18/6, 2015 at 15:39 Comment(0)
W
11

It’s work!

os.path.basename(name)

But you can’t get file name in Linux with Windows file path. Windows too. os.path load different module on different operator system :

  • Linux - posixpath
  • Windows - npath

So you can use os.path get correct result always

Worth answered 18/8, 2021 at 1:41 Comment(1)
Please make sure that your solution was not already proposed in another answers like the top one. Also there are some caveats that described in these top questions and their comments.Fastening
M
10

Here's a regex-only solution, which seems to work with any OS path on any OS.

No other module is needed, and no preprocessing is needed either :

import re

def extract_basename(path):
  """Extracts basename of a given path. Should Work with any OS Path on any OS"""
  basename = re.search(r'[^\\/]+(?=[\\/]?$)', path)
  if basename:
    return basename.group(0)


paths = ['a/b/c/', 'a/b/c', '\\a\\b\\c', '\\a\\b\\c\\', 'a\\b\\c',
         'a/b/../../a/b/c/', 'a/b/../../a/b/c']

print([extract_basename(path) for path in paths])
# ['c', 'c', 'c', 'c', 'c', 'c', 'c']


extra_paths = ['C:\\', 'alone', '/a/space in filename', 'C:\\multi\nline']

print([extract_basename(path) for path in extra_paths])
# ['C:', 'alone', 'space in filename', 'multi\nline']

Update:

If you only want a potential filename, if present (i.e., /a/b/ is a dir and so is c:\windows\), change the regex to: r'[^\\/]+(?![\\/])$' . For the "regex challenged," this changes the positive forward lookahead for some sort of slash to a negative forward lookahead, causing pathnames that end with said slash to return nothing instead of the last sub-directory in the pathname. Of course there is no guarantee that the potential filename actually refers to a file and for that os.path.is_dir() or os.path.is_file() would need to be employed.

This will match as follows:

/a/b/c/             # nothing, pathname ends with the dir 'c'
c:\windows\         # nothing, pathname ends with the dir 'windows'
c:hello.txt         # matches potential filename 'hello.txt'
~it_s_me/.bashrc    # matches potential filename '.bashrc'
c:\windows\system32 # matches potential filename 'system32', except
                    # that is obviously a dir. os.path.is_dir()
                    # should be used to tell us for sure

The regex can be tested here.

Maltese answered 28/11, 2016 at 13:33 Comment(2)
you are using re, why not os module ?Trifle
@SaurabhChandraPatel it's been a long time. If I remember correctly, regex is used as a cross platform solution in this case. You can process windows file names on a Linux server, for example.Maltese
J
10

If your file path not ended with "/" and directories separated by "/" then use the following code. As we know generally path doesn't end with "/".

import os
path_str = "/var/www/index.html"
print(os.path.basename(path_str))

But in some cases like URLs end with "/" then use the following code

import os
path_str = "/home/some_str/last_str/"
split_path = path_str.rsplit("/",1)
print(os.path.basename(split_path[0]))

but when your path sperated by "\" which you generally find in windows paths then you can use the following codes

import os
path_str = "c:\\var\www\index.html"
print(os.path.basename(path_str))

import os
path_str = "c:\\home\some_str\last_str\\"
split_path = path_str.rsplit("\\",1)
print(os.path.basename(split_path[0]))

You can combine both into one function by check OS type and return the result.

Jeer answered 19/2, 2019 at 5:26 Comment(0)
M
6

Maybe just my all in one solution without important some new(regard the tempfile for creating temporary files :D )

import tempfile
abc = tempfile.NamedTemporaryFile(dir='/tmp/')
abc.name
abc.name.replace("/", " ").split()[-1] 

Getting the values of abc.name will be a string like this: '/tmp/tmpks5oksk7' So I can replace the / with a space .replace("/", " ") and then call split(). That will return a list and I get the last element of the list with [-1]

No need to get any module imported.

Mosesmosey answered 21/8, 2014 at 15:23 Comment(2)
What if the filename or a directory contains a space ?Neediness
What about a direct split("/")[-1] ?Blowbyblow
S
6

If you have a number of files in a directory and want to store those file names into a list. Use the below code.

import os as os
import glob as glob
path = 'mypath'
file_list= []
for file in glob.glob(path):
    data_file_list = os.path.basename(file)
    file_list.append(data_file_list)
Stockist answered 13/2, 2021 at 8:0 Comment(0)
B
4

I have never seen double-backslashed paths, are they existing? The built-in feature of python module os fails for those. All others work, also the caveat given by you with os.path.normpath():

paths = ['a/b/c/', 'a/b/c', '\\a\\b\\c', '\\a\\b\\c\\', 'a\\b\\c', 
...     'a/b/../../a/b/c/', 'a/b/../../a/b/c', 'a/./b/c', 'a\b/c']
for path in paths:
    os.path.basename(os.path.normpath(path))
Baber answered 19/3, 2015 at 21:18 Comment(1)
Those are not double backslahes. They are single backslashes, and they need to be escaped.Maltese
E
4
import os

path = r"C:\py_auto_script\testing.xlsx"
os.path.basename(path)
=> 'testing.xlsx'

os.path.dirname(os.path.realpath(path))
=>'C:\\py_auto_script'
Elga answered 26/6, 2023 at 14:35 Comment(1)
Please, do not provide code only answer ,provide the explanation as well. Moreover, I fail to see how your answer adds something mor ethan the 22 already existing answer on this 11 years old question.Melonie
R
3

The Windows separator can be in a Unix filename or Windows Path. The Unix separator can only exist in the Unix path. The presence of a Unix separator indicates a non-Windows path.

The following will strip (cut trailing separator) by the OS specific separator, then split and return the rightmost value. It's ugly, but simple based on the assumption above. If the assumption is incorrect, please update and I will update this response to match the more accurate conditions.

a.rstrip("\\\\" if a.count("/") == 0 else '/').split("\\\\" if a.count("/") == 0 else '/')[-1]

sample code:

b = ['a/b/c/','a/b/c','\\a\\b\\c','\\a\\b\\c\\','a\\b\\c','a/b/../../a/b/c/','a/b/../../a/b/c']

for a in b:

    print (a, a.rstrip("\\" if a.count("/") == 0 else '/').split("\\" if a.count("/") == 0 else '/')[-1])
Ratiocination answered 16/5, 2016 at 14:29 Comment(1)
Also, feel free to send me pointers on how to format in this venue. Took half a dozen tries to get the sample code in place.Ratiocination
H
2

For completeness sake, here is the pathlib solution for python 3.2+:

>>> from pathlib import PureWindowsPath

>>> paths = ['a/b/c/', 'a/b/c', '\\a\\b\\c', '\\a\\b\\c\\', 'a\\b\\c', 
...          'a/b/../../a/b/c/', 'a/b/../../a/b/c']

>>> [PureWindowsPath(path).name for path in paths]
['c', 'c', 'c', 'c', 'c', 'c', 'c']

This works on both Windows and Linux.

Haeckel answered 24/8, 2017 at 14:31 Comment(0)
M
2

In both Python 2 and 3, using the module pathlib2:

import posixpath  # to generate unix paths
from pathlib2 import PurePath, PureWindowsPath, PurePosixPath

def path2unix(path, nojoin=True, fromwinpath=False):
    """From a path given in any format, converts to posix path format
    fromwinpath=True forces the input path to be recognized as a Windows path (useful on Unix machines to unit test Windows paths)"""
    if not path:
        return path
    if fromwinpath:
        pathparts = list(PureWindowsPath(path).parts)
    else:
        pathparts = list(PurePath(path).parts)
    if nojoin:
        return pathparts
    else:
        return posixpath.join(*pathparts)

Usage:

In [9]: path2unix('lala/lolo/haha.dat')
Out[9]: ['lala', 'lolo', 'haha.dat']

In [10]: path2unix(r'C:\lala/lolo/haha.dat')
Out[10]: ['C:\\', 'lala', 'lolo', 'haha.dat']

In [11]: path2unix(r'C:\lala/lolo/haha.dat') # works even with malformatted cases mixing both Windows and Linux path separators
Out[11]: ['C:\\', 'lala', 'lolo', 'haha.dat']

With your testcase:

In [12]: testcase = paths = ['a/b/c/', 'a/b/c', '\\a\\b\\c', '\\a\\b\\c\\', 'a\\b\\c',
    ...: ...     'a/b/../../a/b/c/', 'a/b/../../a/b/c']

In [14]: for t in testcase:
    ...:     print(path2unix(t)[-1])
    ...:
    ...:
c
c
c
c
c
c
c

The idea here is to convert all paths into the unified internal representation of pathlib2, with different decoders depending on the platform. Fortunately, pathlib2 includes a generic decoder called PurePath that should work on any path. In case this does not work, you can force the recognition of windows path using fromwinpath=True. This will split the input string into parts, the last one is the leaf you are looking for, hence the path2unix(t)[-1].

If the argument nojoin=False, the path will be joined back, so that the output is simply the input string converted to a Unix format, which can be useful to compare subpaths across platforms.

Magistery answered 28/12, 2018 at 22:3 Comment(0)
P
1

I use this method on Windows and Ubuntu (WSL) and it works as (I) expected only using 'import os': So basically, replace() put the right path seperator based on your current os platform.

If the path finished by a slash '/', then it's not a file but a directory, so it returns an empty string.

import os

my_fullpath = r"D:\MY_FOLDER\TEST\20201108\20201108_073751.DNG"
os.path.basename(my_fullpath.replace('\\',os.sep))

my_fullpath = r"/MY_FOLDER/TEST/20201108/20201108_073751.DNG"
os.path.basename(my_fullpath.replace('\\',os.sep))

my_fullpath = r"/MY_FOLDER/TEST/20201108/"
os.path.basename(my_fullpath.replace('\\',os.sep))

my_fullpath = r"/MY_FOLDER/TEST/20201108"
os.path.basename(my_fullpath.replace('\\',os.sep))

On Windows (Left) and Ubuntu (via WSL, Right): enter image description here

Perform answered 8/9, 2021 at 3:51 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.