Extracting extension from filename in Python
Asked Answered
C

33

1820

Is there a function to extract the extension from a filename?

Caseate answered 12/2, 2009 at 14:11 Comment(0)
C
2615

Use os.path.splitext:

>>> import os
>>> filename, file_extension = os.path.splitext('/path/to/somefile.ext')
>>> filename
'/path/to/somefile'
>>> file_extension
'.ext'

Unlike most manual string-splitting attempts, os.path.splitext will correctly treat /a/b.c/d as having no extension instead of having extension .c/d, and it will treat .bashrc as having no extension instead of having extension .bashrc:

>>> os.path.splitext('/a/b.c/d')
('/a/b.c/d', '')
>>> os.path.splitext('.bashrc')
('.bashrc', '')
Cartogram answered 12/2, 2009 at 14:12 Comment(11)
the use of basename is a little confusing here since os.path.basename("/path/to/somefile.ext") would return "somefile.ext"Jeopardy
see also ideas below concerning lower() and double extensionsAnnabelle
wouldn't endswith() not be more portable and pythonic?Timeserver
You can't rely on that if you have files with "double extensions", like .mp3.asd for example, because it will return you only the "last" extension!Millda
@Millda Well, in that case, .asd is really the extension!! If you think about it, foo.tar.gz is a gzip-compressed file (.gz) which happens to be a tar file (.tar). But it is a gzip file in first place. I wouldn't expect it to return the dual extension at all.Cartogram
The standard Python function naming convention is really annoying - almost every time I re-look this up, I mistake it as being splittext. If they would just do anything to signify the break between parts of this name, it'd be much easier to recognize that it's splitExt or split_ext. Surely I can't be the only person who has made this mistake?Ralf
a tuple is returned and if you want to get the extension use: file_extension=os.path.splitext('/path/to/somefile.ext')[1] or if you want the filename use: filename=os.path.splitext('/path/to/somefile.ext')[0]Olmsted
@Ralf Given the small size of the os.path submodule, you could conceivably remap the names manually in your own module saved on your Python path. E.g. myospath.py containing things like splitExt = os.path.splitext.Abolish
Despite not answering to the OP, it would be a great improvement if you also provide the opposite solution. Or a link for it. When I am saying the opposite, I am referring to now the filename with no extension.Changchun
@FranciscoMariaCalisto I'm not sure what you mean. There is already one example of a file without extension in the answer, the /a/b.c/d file. If by "opposite" of splitting you mean joining the filename with an extension, that can be done by normal concatenation: filename + file_extensionCartogram
If you just need the extension, without the . prefixed, use os.path.splitext('filename.ext')[1][1:]. Obvious but just a reminder for those of us who like to code mainly using [ctrl]+C/V.Audry
C
680

New in version 3.4.

import pathlib

print(pathlib.Path('yourPath.example').suffix) # '.example'
print(pathlib.Path("hello/foo.bar.tar.gz").suffixes) # ['.bar', '.tar', '.gz']
print(pathlib.Path('/foo/bar.txt').stem) # 'bar'

I'm surprised no one has mentioned pathlib yet, pathlib IS awesome!

Curare answered 3/2, 2016 at 21:41 Comment(5)
example for getting .tar.gz: ''.join(pathlib.Path('somedir/file.tar.gz').suffixes)Historiography
Great answer. I found this tutorial more useful than the documentation: zetcode.com/python/pathlibGodsey
@user3780389 Wouldn't a "foo.bar.tar.gz" still be a valid ".tar.gz"? If so your snippet should be using .suffixes[-2:] to ensure only getting .tar.gz at most.Curare
there are still cases when this does not work as expected like "filename with.a dot inside.tar". This is the solution i am using currently: "".join([s for s in pathlib.Path('somedir/file.tar.gz').suffixes if not " " in s])Thursby
this one should be acceptable answerCensus
K
494
import os.path
extension = os.path.splitext(filename)[1]
Keyte answered 12/2, 2009 at 14:15 Comment(10)
Out of curiosity, why import os.path instead of from os import path?Pean
@Pean - I suppose you could do it that way. I've seen more code using import os.path though.Keyte
it depends really, if you use from os import path then the name path is taken up in your local scope, also others looking at the code may not immediately know that path is the path from the os module. Where as if you use import os.path it keeps it within the os namespace and wherever you make the call people know it's path() from the os module immediately.Incapacitate
but what's the point of importing os.path if we could just import os?Arp
@IvanVirabyan See this question: #2724848Keyte
I know it's not semantically any different, but I personally find the construction _, extension = os.path.splitext(filename) to be much nicer-looking.Impromptu
@TimGilbert Hmm... same exact number of characters, but I feel _, is less heavy/distracting than [1], and maybe even a bit clearer. I ran both through timeit... your form takes between 2.02 usec and 2.04 usec on my computer, while the form in this answer takes between 2.04 usec and 2.06 usec. So performance is infinitesimally improved at best, or the exact same at worse. I can't come up with any reason to not use your form.Ralf
If you want the extension as part of a more complex expression the [1] may be more useful: if check_for_gzip and os.path.splitext(filename)[1] == '.gz':Gomorrah
@Pean The docs literally say to just import os: help(os.path) -> "Instead of importing this module directly, import os and refer to this module as os.path."Hendrika
Make sure to append [1:] to the result, to remove the leading dot since that's not part of the extension: en.wikipedia.org/wiki/Filename_extensionOdont
P
145
import os.path
extension = os.path.splitext(filename)[1][1:]

To get only the text of the extension, without the dot.

Precentor answered 26/8, 2011 at 9:37 Comment(1)
This will return empty for both file names end with . and file names without an extension.Oxygen
K
113

For simple use cases one option may be splitting from dot:

>>> filename = "example.jpeg"
>>> filename.split(".")[-1]
'jpeg'

No error when file doesn't have an extension:

>>> "filename".split(".")[-1]
'filename'

But you must be careful:

>>> "png".split(".")[-1]
'png'    # But file doesn't have an extension

Also will not work with hidden files in Unix systems:

>>> ".bashrc".split(".")[-1]
'bashrc'    # But this is not an extension

For general use, prefer os.path.splitext

Kumiss answered 9/4, 2012 at 18:48 Comment(13)
This would get upset if you're uploading x.tar.gzMonkey
Not actually. Extension of a file named "x.tar.gz" is "gz" not "tar.gz". os.path.splitext gives ".os" as extension too.Colorado
This works when you are processing files for platforms other than the one you run.Hudgins
can we use [1] rather than [-1]. I could not understand [-1] with splitHexateuch
[-1] to get last item of items that splitted by dot. Example: "my.file.name.js".split('.') => ['my','file','name','js]Colorado
@MuratÇorlu rsplit[1] is the better approach for this. I agree with you, though, the LAST extension represents the current wrapper or encoding of the file. myfile.tar.gz is a gzipped file before it is a tar.Cane
@MuratÇorlu Whoops: rsplit('.', 1) is what I meant! Then you check if the length of the list being returned is > 1 or not as a test.Cane
@BenjaminR I couldn't get the motivation behind using rsplit instead of split for this case. rsplit('.', 1) also returns an array. So again you'll need to get last item of that array to get extension only, right? Also you can check the length of the array that split produce to test it has an extension or not.Colorado
@MuratÇorlu The return value from rsplit in this case is more reflective of what your intent is. You only care about what's split on the last period, not any other periods. Why have a list (it's not an array fyi) with bits you don't care about? You only care about what's before and after that last period.Cane
@BenjaminR ah ok, you are making an optimisation about result list. ['file', 'tar', 'gz'] with 'file.tar.gz'.split('.') vs ['file.tar', 'gz'] with 'file.tar.gz'.rsplit('.', 1). yeah, could be.Colorado
just a heads up, this method fails if you have any directories in here that have . in them along with an extensionless file (which both are common in a *nix env) "/home/example/.config/README".split(".")[-1] == ".config/README"Hydrangea
@Hydrangea Good point but question is about "extracting extension from filename". What is the extension of README? So, if there is no extension, yes, this method will not work. To avoid the situation about directory names, a solution can be first find the "filename" from full path.Colorado
@MuratÇorlu The extension of README is "" because it has no extension. Whereas with this solution the extension of README would be README -- which is wrong. That does appear already stated in the answer though -- I just wanted to also point out the condition with directories.Hydrangea
M
42

worth adding a lower in there so you don't find yourself wondering why the JPG's aren't showing up in your list.

os.path.splitext(filename)[1][1:].strip().lower()
Mersey answered 28/12, 2012 at 7:25 Comment(2)
The strip() will break in rare edge-cases where the filename extension includes whitespace.Mccallum
Some filesystems are case-sensitive (like the ones on Linux), and even NTFS is case-sensitive, although Windows tries to treat it in a case-insensitive manner. Be careful with case.Mccallum
I
22

Any of the solutions above work, but on linux I have found that there is a newline at the end of the extension string which will prevent matches from succeeding. Add the strip() method to the end. For example:

import os.path
extension = os.path.splitext(filename)[1][1:].strip() 
Interinsurance answered 10/10, 2011 at 22:48 Comment(3)
To aid my understanding, please could you explain what additional behaviour the second index/slice guards against? (i.e. the [1:] in .splittext(filename)[1][1:]) - thank you in advanceReeher
Figured it out for myself: splittext() (unlike if you split a string using '.') includes the '.' character in the extension. The additional [1:] gets rid of it.Reeher
This will break if the file extension contains whitespace. That's a rare case, I know, but it's still an edge-case that should be considered.Mccallum
G
22

With splitext there are problems with files with double extension (e.g. file.tar.gz, file.tar.bz2, etc..)

>>> fileName, fileExtension = os.path.splitext('/path/to/somefile.tar.gz')
>>> fileExtension 
'.gz'

but should be: .tar.gz

The possible solutions are here

Gavial answered 5/2, 2013 at 0:19 Comment(5)
do it twice to get the 2 extensions ?Anette
@Anette yep. gunzip somefile.tar.gz what's the output filename?Tamas
This is why we have the extension 'tgz' which means: tar+gzip ! :DHerv
@Tamas The filename should obviously be somefile.tar. For tar -xzvf somefile.tar.gz the filename should be somefile.Zedekiah
@Zedekiah I don't think you want your python script to be aware of the application used to create the filename. It's a bit out of scope of the question. Don't pick on the example, 'filename.csv.gz' is also quite valid.Tamas
A
22

You can find some great stuff in pathlib module (available in python 3.x).

import pathlib
x = pathlib.PurePosixPath("C:\\Path\\To\\File\\myfile.txt").suffix
print(x)

# Output 
'.txt'
Allpurpose answered 11/8, 2018 at 19:23 Comment(1)
Using PosixPath for a windows path is wrong.Neoarsphenamine
C
19

Just join all pathlib suffixes.

>>> x = 'file/path/archive.tar.gz'
>>> y = 'file/path/text.txt'
>>> ''.join(pathlib.Path(x).suffixes)
'.tar.gz'
>>> ''.join(pathlib.Path(y).suffixes)
'.txt'
Caseate answered 12/8, 2018 at 15:5 Comment(0)
F
17

Although it is an old topic, but i wonder why there is none mentioning a very simple api of python called rpartition in this case:

to get extension of a given file absolute path, you can simply type:

filepath.rpartition('.')[-1]

example:

path = '/home/jersey/remote/data/test.csv'
print path.rpartition('.')[-1]

will give you: 'csv'

Fluster answered 27/2, 2017 at 3:53 Comment(1)
For those not familiar with the API, rpartition returns a tuple: ("string before the right-most occurrence of the separator", "the separator itself", "the rest of the string"). If there's no separator found, the returned tuple will be: ("", "", "the original string").Forbade
O
12

Surprised this wasn't mentioned yet:

import os
fn = '/some/path/a.tar.gz'

basename = os.path.basename(fn)  # os independent
Out[] a.tar.gz

base = basename.split('.')[0]
Out[] a

ext = '.'.join(basename.split('.')[1:])   # <-- main part

# if you want a leading '.', and if no result `None`:
ext = '.' + ext if ext else None
Out[] .tar.gz

Benefits:

  • Works as expected for anything I can think of
  • No modules
  • No regex
  • Cross-platform
  • Easily extendible (e.g. no leading dots for extension, only last part of extension)

As function:

def get_extension(filename):
    basename = os.path.basename(filename)  # os independent
    ext = '.'.join(basename.split('.')[1:])
    return '.' + ext if ext else None
Orpine answered 20/12, 2015 at 0:24 Comment(4)
This results in an exception when the file doesn't have any extension.Coachwork
This answer absolutely ignore a variant if a filename contains many points in name. Example get_extension('cmocka-1.1.0.tar.xz') => '.1.0.tar.xz' - wrong.Germayne
@PADYMKO, IMHO one should not create filenames with full stops as part of the filename. The code above is not supposed to result in 'tar.xz'Flemish
Just change to [-1] then.Orpine
C
12

You can use a split on a filename:

f_extns = filename.split(".")
print ("The extension of the file is : " + repr(f_extns[-1]))

This does not require additional library

Census answered 15/3, 2018 at 18:34 Comment(0)
S
11
filename='ext.tar.gz'
extension = filename[filename.rfind('.'):]
Stavanger answered 18/2, 2014 at 10:55 Comment(1)
This results in the last char of filename being returned if the filename has no . at all. This is because rfind returns -1 if the string is not found.Claytor
O
10

Extracting extension from filename in Python

Python os module splitext()

splitext() function splits the file path into a tuple having two values – root and extension.

import os
# unpacking the tuple
file_name, file_extension = os.path.splitext("/Users/Username/abc.txt")
print(file_name)
print(file_extension)

Get File Extension using Pathlib Module

Pathlib module to get the file extension

import pathlib
pathlib.Path("/Users/pankaj/abc.txt").suffix
#output:'.txt'
Octagon answered 23/11, 2021 at 2:31 Comment(0)
L
6

Even this question is already answered I'd add the solution in Regex.

>>> import re
>>> file_suffix = ".*(\..*)"
>>> result = re.search(file_suffix, "somefile.ext")
>>> result.group(1)
'.ext'
Lemures answered 30/10, 2017 at 8:42 Comment(1)
Or \.[0-9a-z]+$ as in this post.Berri
C
6

This is a direct string representation techniques : I see a lot of solutions mentioned, but I think most are looking at split. Split however does it at every occurrence of "." . What you would rather be looking for is partition.

string = "folder/to_path/filename.ext"
extension = string.rpartition(".")[-1]
Cobia answered 18/4, 2018 at 11:6 Comment(1)
rpartition was already suggested by @weiyixie.Forbade
M
5

Another solution with right split:

# to get extension only

s = 'test.ext'

if '.' in s: ext = s.rsplit('.', 1)[1]

# or, to get file name and extension

def split_filepath(s):
    """
    get filename and extension from filepath 
    filepath -> (filename, extension)
    """
    if not '.' in s: return (s, '')
    r = s.rsplit('.', 1)
    return (r[0], r[1])
Menchaca answered 3/1, 2014 at 7:32 Comment(0)
R
5

you can use following code to split file name and extension.

    import os.path
    filenamewithext = os.path.basename(filepath)
    filename, ext = os.path.splitext(filenamewithext)
    #print file name
    print(filename)
    #print file extension
    print(ext)
Regulable answered 11/10, 2021 at 11:46 Comment(0)
B
3

Well , i know im late

that's my simple solution

file = '/foo/bar/whatever.ext'
extension = file.split('.')[-1]
print(extension)

#output will be ext
Bopp answered 29/9, 2022 at 10:3 Comment(2)
@NsaNinja but the malware.pdf.exe is [exe] type ! also for tar.gz !Bopp
I agree that there are drawbacks for completeness, HOWEVER, this is a "simple" solution and for simple uses it works. In my case, for example, I've already confirmed that the file exists and is one of several filtered file types. I just need to know which one. For that application, this works.Convertible
A
2

A true one-liner, if you like regex. And it doesn't matter even if you have additional "." in the middle

import re

file_ext = re.search(r"\.([^.]+)$", filename).group(1)

See here for the result: Click Here

Abbotsun answered 9/3, 2020 at 2:1 Comment(0)
D
2

You can use endswith to identify the file extension in python

like bellow example

for file in os.listdir():
    if file.endswith('.csv'):
        df1 =pd.read_csv(file)
        frames.append(df1)
        result = pd.concat(frames)
Dendrochronology answered 23/5, 2022 at 6:13 Comment(0)
L
1

try this:

files = ['file.jpeg','file.tar.gz','file.png','file.foo.bar','file.etc']
pen_ext = ['foo', 'tar', 'bar', 'etc']

for file in files: #1
    if (file.split(".")[-2] in pen_ext): #2
        ext =  file.split(".")[-2]+"."+file.split(".")[-1]#3
    else:
        ext = file.split(".")[-1] #4
    print (ext) #5
  1. get all file name inside the list
  2. splitting file name and check the penultimate extension, is it in the pen_ext list or not?
  3. if yes then join it with the last extension and set it as the file's extension
  4. if not then just put the last extension as the file's extension
  5. and then check it out
Ladoga answered 20/4, 2020 at 23:50 Comment(5)
This breaks for a bunch of special cases. See the accepted answer. It's reinventing the wheel, only in a buggy way.Lauds
Hello! While this code may solve the question, including an explanation of how and why this solves the problem would really help to improve the quality of your post, and probably result in more up-votes. Remember that you are answering the question for readers in the future, not just the person asking now. Please edit your answer to add explanations and give an indication of what limitations and assumptions apply.Mejias
@Brian like that?Ladoga
You're only making it worse, breaking it in new ways. foo.tar is a valid file name. What happens if I throw that at your code? What about .bashrc or foo? There is a library function for this for a reason...Lauds
just create a list of extension file for the penultimate extension, if not in list then just put the last extension as the file's extensionLadoga
S
1

The easiest way to get is to use mimtypes, below is the example:

import mimetypes

mt = mimetypes.guess_type("file name")
file_extension =  mt[0]
print(file_extension)
Socman answered 29/9, 2022 at 9:51 Comment(0)
O
1

I'm definitely late to the party, but in case anyone wanted to achieve this without the use of another library:

file_path = "example_tar.tar.gz"
file_name, file_ext = [file_path if "." not in file_path else file_path.split(".")[0], "" if "." not in file_path else file_path[file_path.find(".") + 1:]]
print(file_name, file_ext)

The 2nd line is basically just the following code but crammed into one line:

def name_and_ext(file_path):
    if "." not in file_path:
        file_name = file_path
    else:
        file_name = file_path.split(".")[0]
    if "." not in file_path:
        file_ext = ""
    else:
        file_ext = file_path[file_path.find(".") + 1:]
    return [file_name, file_ext]

Even though this works, it might not work will all types of files, specifically .zshrc, I would recomment using os's os.path.splitext function, example below:

import os
file_path = "example.tar.gz"
file_name, file_ext = os.path.splitext(file_path)
print(file_name, file_ext)

Cheers :)

Overwrite answered 11/2, 2023 at 23:9 Comment(0)
M
0

For funsies... just collect the extensions in a dict, and track all of them in a folder. Then just pull the extensions you want.

import os

search = {}

for f in os.listdir(os.getcwd()):
    fn, fe = os.path.splitext(f)
    try:
        search[fe].append(f)
    except:
        search[fe]=[f,]

extensions = ('.png','.jpg')
for ex in extensions:
    found = search.get(ex,'')
    if found:
        print(found)
Melaniamelanic answered 14/2, 2020 at 16:42 Comment(1)
That's a terrible idea. Your code breaks for any file extension you haven't previously added!Lauds
M
0

This method will require a dictonary, list, or set. you can just use ".endswith" using built in string methods. This will search for name in list at end of file and can be done with just str.endswith(fileName[index]). This is more for getting and comparing extensions.

https://docs.python.org/3/library/stdtypes.html#string-methods

Example 1:

dictonary = {0:".tar.gz", 1:".txt", 2:".exe", 3:".js", 4:".java", 5:".python", 6:".ruby",7:".c", 8:".bash", 9:".ps1", 10:".html", 11:".html5", 12:".css", 13:".json", 14:".abc"} 
for x in dictonary.values():
    str = "file" + x
    str.endswith(x, str.index("."), len(str))

Example 2:

set1 = {".tar.gz", ".txt", ".exe", ".js", ".java", ".python", ".ruby", ".c", ".bash", ".ps1", ".html", ".html5", ".css", ".json", ".abc"}
for x in set1:
   str = "file" + x
   str.endswith(x, str.index("."), len(str))

Example 3:

fileName = [".tar.gz", ".txt", ".exe", ".js", ".java", ".python", ".ruby", ".c", ".bash", ".ps1", ".html", ".html5", ".css", ".json", ".abc"];
for x in range(0, len(fileName)):
    str = "file" + fileName[x]
    str.endswith(fileName[x], str.index("."), len(str))

Example 4

fileName = [".tar.gz", ".txt", ".exe", ".js", ".java", ".python", ".ruby", ".c", ".bash", ".ps1", ".html", ".html5", ".css", ".json", ".abc"];
str = "file.txt"
str.endswith(fileName[1], str.index("."), len(str))

Examples 5, 6, 7 with output enter image description here

Example 8

fileName = [".tar.gz", ".txt", ".exe", ".js", ".java", ".python", ".ruby", ".c", ".bash", ".ps1", ".html", ".html5", ".css", ".json", ".abc"];
exts = []
str = "file.txt"
for x in range(0, len(x)):
    if str.endswith(fileName[1]) == 1:
         exts += [x]
     
Mongolism answered 20/7, 2022 at 1:46 Comment(0)
E
0

Here if you want to extract the last file extension if it has multiple

class functions:
    def listdir(self, filepath):
        return os.listdir(filepath)
    
func = functions()

os.chdir("C:\\Users\Asus-pc\Downloads") #absolute path, change this to your directory
current_dir = os.getcwd()

for i in range(len(func.listdir(current_dir))): #i is set to numbers of files and directories on path directory
    if os.path.isfile((func.listdir(current_dir))[i]): #check if it is a file
        fileName = func.listdir(current_dir)[i] #put the current filename into a variable
        rev_fileName = fileName[::-1] #reverse the filename
        currentFileExtension = rev_fileName[:rev_fileName.index('.')][::-1] #extract from beginning until before .
        print(currentFileExtension) #output can be mp3,pdf,ini,exe, depends on the file on your absolute directory

Output is mp3, even works if has only 1 extension name

Eck answered 17/12, 2022 at 13:46 Comment(0)
G
-2
# try this, it works for anything, any length of extension
# e.g www.google.com/downloads/file1.gz.rs -> .gz.rs

import os.path

class LinkChecker:

    @staticmethod
    def get_link_extension(link: str)->str:
        if link is None or link == "":
            return ""
        else:
            paths = os.path.splitext(link)
            ext = paths[1]
            new_link = paths[0]
            if ext != "":
                return LinkChecker.get_link_extension(new_link) + ext
            else:
                return ""
Glycogenesis answered 1/4, 2015 at 16:56 Comment(0)
R
-2
a = ".bashrc"
b = "text.txt"
extension_a = a.split(".")
extension_b = b.split(".")
print(extension_a[-1])  # bashrc
print(extension_b[-1])  # txt
Read answered 15/2, 2021 at 9:15 Comment(1)
Please add explanation of the code, rather than simply just the code snippets.Uveitis
I
-3
def NewFileName(fichier):
    cpt = 0
    fic , *ext =  fichier.split('.')
    ext = '.'.join(ext)
    while os.path.isfile(fichier):
        cpt += 1
        fichier = '{0}-({1}).{2}'.format(fic, cpt, ext)
    return fichier
Immensurable answered 6/11, 2015 at 20:24 Comment(0)
P
-3

This is The Simplest Method to get both Filename & Extension in just a single line.

fName, ext = 'C:/folder name/Flower.jpeg'.split('/')[-1].split('.')

>>> print(fName)
Flower
>>> print(ext)
jpeg

Unlike other solutions, you don't need to import any package for this.

Prologize answered 4/1, 2020 at 10:7 Comment(2)
this doesnt work for all files or types for example 'archive.tar.gzSafar
Windows uses / as a path separator, but other operating systems use \Mccallum
G
-4
name_only=file_name[:filename.index(".")

That will give you the file name up to the first ".", which would be the most common.

Golding answered 22/8, 2014 at 19:19 Comment(2)
first, he needs not the name, but extension. Second, even if he would need name, it would be wrong by files like: file.name.extContentious
As mentioned by @ya_dimon, this wont work for files names with dots. Plus, he needs the extension!Concrescence

© 2022 - 2024 — McMap. All rights reserved.