Python glob multiple filetypes

C

42

256

Is there a better way to use glob.glob in python to get a list of multiple file types such as .txt, .mdown, and .markdown? Right now I have something like this:

projectFiles1 = glob.glob( os.path.join(projectDir, '*.txt') )
projectFiles2 = glob.glob( os.path.join(projectDir, '*.mdown') )
projectFiles3 = glob.glob( os.path.join(projectDir, '*.markdown') )

Caulicle answered 31/12, 2010 at 6:39 Comment(3)

Very related: https://mcmap.net/q/111352/-how-to-glob-two-patterns-with-pathlib/880783 – Alecto 28/3, 2018 at 7:16

Why not main_file = projectFiles1 + projectFiles2 + projectFiles3 ? which will also lead to a main list with all the types by concatenation – Arabist 4/9, 2020 at 15:29

Never saw a file *.mdown ..;) – Footway 25/12, 2020 at 15:53

I

239

Maybe there is a better way, but how about:

import glob
types = ('*.pdf', '*.cpp') # the tuple of file types
files_grabbed = []
for files in types:
    files_grabbed.extend(glob.glob(files))

# files_grabbed is the list of pdf and cpp files

Perhaps there is another way, so wait in case someone else comes up with a better answer.

Incredulous answered 31/12, 2010 at 6:53 Comment(8)

files_grabbed = [glob.glob(e) for e in ['*.pdf', '*.cpp']] – Binal 10/11, 2016 at 6:22

Novitoll's solution is short, but it ends up creating nested lists. – Brindabrindell 29/1, 2017 at 20:4

you could always do this ;) [f for f_ in [glob.glob(e) for e in ('*.jpg', '*.mp4')] for f in f_] – Provost 10/6, 2017 at 4:50

files_grabbed = [glob.glob(e) for e in ['.pdf', '*.cpp']] – Exhilarative 20/4, 2018 at 11:27

This loops twice through the list of files. In the first iteration it checks for *.pdf and in the second it checks for *.cpp. Is there a way to get it done in one iteration? Check the combined condition each time? – Nathannathanael 9/11, 2018 at 12:31

How does it play out in either of the above solutions if 2 or more extensions match the same file. In that case we would have duplicates that need to be accounted for... I think the task implies that we want every unique file so the solution should account for that. – Hypothermal 28/9, 2019 at 16:6

@AlexG is that a nested loop or one loop? – Pulchritude 21/7, 2020 at 1:57

Actually @AlexG Can you please explain what type of wizardry you just did. – Pulchritude 21/7, 2020 at 2:12

T

128

glob returns a list: why not just run it multiple times and concatenate the results?

from glob import glob
project_files = glob('*.txt') + glob('*.mdown') + glob('*.markdown')

Thisbe answered 29/12, 2015 at 8:31 Comment(5)

This is possibly the most readable solution given. I would change the case of ProjectFiles to projectFiles, but great solution. – Logistic 2/6, 2017 at 22:38

Note that in python 3x Path.glob('*') returns a generator, so you need to put a list(...) around it to use this trick. – Belligerence 9/8, 2021 at 20:18

@MarcMaxmeister Not true! glob does return a generator, but concatenation works as expected, at least in Python 3.5+. I don't have a quick way already set up to test earlier Pythons, though. – Thisbe 24/2, 2022 at 7:7

@Thisbe in Python 3.10, Path().glob("*") + Path().glob("*") gives "TypeError: unsupported operand type(s) for +: 'generator' and 'generator'". – Alecto 30/3, 2022 at 14:57

@Alecto Yup, but that's because Path.glob() doesn't have the same semantics as glob.glob(). My comment was about glob.glob(), which works as-is in Python 3.10: glob.glob('*.md') + glob.glob('*.jpg') works fine in Python 3.10. Works the same way in Python 3.7: Path.glob() returns a generator, but glob.glob() returns a list. – Thisbe 3/4, 2022 at 20:39

S

90

So many answers that suggest globbing as many times as number of extensions, I'd prefer globbing just once instead:

from pathlib import Path

files = (p.resolve() for p in Path(path).glob("**/*") if p.suffix in {".c", ".cc", ".cpp", ".hxx", ".h"})

Sage answered 16/7, 2019 at 9:24 Comment(5)

Use a set of extensions instead of a list to improve performance. – Nogood 6/3, 2021 at 18:32

Fastest answer so far. You should use a set of extensions and you may change to Path(path).iterdir() do disallow recursive iteration. – Tenne 1/4, 2021 at 11:18

@LouisLac I tested this with a pure set-based implementation and a pure list-based implementation using 8 extensions and searching over many thousands of files. There was no significant difference in performance. – Hundredweight 28/8, 2021 at 13:54

@MoutainX sets really begin to outperform lists at a significantly higher number of extensions (few thousand I think). Usually people do not lookup that much extensions so this won’t make a difference here, but it is a good practice. – Tenne 28/8, 2021 at 14:9

@LouisLac - also in terms of overall speed, my test results are similar to https://mcmap.net/q/109443/-python-glob-multiple-filetypes -- the fastest solution uses nested for-loops instead of glob. Like for root, dirs, files in walk(path): for file in files: for ext in extensions: – Hundredweight 28/8, 2021 at 15:28

G

74

from glob import glob

files = glob('*.gif')
files.extend(glob('*.png'))
files.extend(glob('*.jpg'))

print(files)

If you need to specify a path, loop over match patterns and keep the join inside the loop for simplicity:

from os.path import join
from glob import glob

files = []
for ext in ('*.gif', '*.png', '*.jpg'):
   files.extend(glob(join("path/to/dir", ext)))

print(files)

Greenness answered 16/10, 2014 at 11:23 Comment(1)

The last example is great. ANy idea how to make that recursive? – Rasheedarasher 7/11, 2022 at 9:20

P

48

Chain the results:

import itertools as it, glob

def multiple_file_types(*patterns):
    return it.chain.from_iterable(glob.iglob(pattern) for pattern in patterns)

Then:

for filename in multiple_file_types("*.txt", "*.sql", "*.log"):
    # do stuff

Palaeontography answered 28/1, 2011 at 14:12 Comment(2)

glob.glob -> glob.iglob so that the iterators chain is fully lazy evaluated – Alliterative 6/8, 2013 at 13:57

I found the same solution but didn't know about chain.from_iterable. So this is similar, but less readable: it.chain(*(glob.iglob(pattern) for pattern in patterns)). – Exhilarative 20/4, 2018 at 11:35

L

37

For example, for *.mp3 and *.flac on multiple folders, you can do:

mask = r'music/*/*.[mf][pl][3a]*'
glob.glob(mask)

The idea can be extended to more file extensions, but you have to check that the combinations won't match any other unwanted file extension you may have on those folders. So, be careful with this.

To automatically combine an arbitrary list of extensions into a single glob pattern, you can do the following:

def multi_extension_glob_mask(mask_base, *extensions):
    mask_ext = ['[{}]'.format(''.join(set(c))) for c in zip(*extensions)]
    if not mask_ext or len(set(len(e) for e in extensions)) > 1:
        mask_ext.append('*')
    return mask_base + ''.join(mask_ext)

mask = multi_extension_glob_mask('music/*/*.', 'mp3', 'flac', 'wma')
print(mask)  # music/*/*.[mfw][pml][a3]*

Liebfraumilch answered 22/3, 2016 at 23:11 Comment(2)

This also worked for me: mask = r'music/*/*[mf|pl|3a]' – Byline 27/6, 2023 at 13:44

@JeffBezos if you got the right result that way, it was only by accident. The | symbols have no special meaning here, and there is also nothing to look for the .. That glob means to match any file whose name (whether or not it is part of an extension) ends with any of m, f, p, l, 3, a or |. A glob is not anything like a regular expression. Anyway, this answer is a fundamentally bad idea because of how brittle it is. Many files with other extensions could match the same pattern. – Celebrate 24/2 at 13:33

H

23

with glob it is not possible. you can use only:
* matches everything
? matches any single character
[seq] matches any character in seq
[!seq] matches any character not in seq

use os.listdir and a regexp to check patterns:

for x in os.listdir('.'):
  if re.match('.*\.txt|.*\.sql', x):
    print x

Hydrometallurgy answered 31/12, 2010 at 7:7 Comment(3)

end your regex with $ to match only the end of the filenames – Cement 31/12, 2010 at 9:0

I like this approach - if glob's expressiveness isn't powerful enough, upgrade to a more powerful regex system, don't hack on it using e.g. itertools because subsequent pattern changes also have to be hacky (say you want to allow upper and lower case). Oh, and it might be cleaner to write '.*\.(txt|sql)' – Malisamalison 29/8, 2013 at 12:39

Is there any reason to prefer os.listdir('.') over glob.iglob('.')? – Morez 11/2, 2016 at 16:55

P

22

While Python's default glob doesn't really follow after Bash's glob, you can do this with other libraries. We can enable braces in wcmatch's glob.

>>> from wcmatch import glob
>>> glob.glob('*.{md,ini}', flags=glob.BRACE)
['LICENSE.md', 'README.md', 'tox.ini']

You can even use extended glob patterns if that is your preference:

from wcmatch import glob
>>> glob.glob('*.@(md|ini)', flags=glob.EXTGLOB)
['LICENSE.md', 'README.md', 'tox.ini']

Pentecostal answered 11/2, 2020 at 13:58 Comment(3)

This doesn't take the recursive flag – Ahead 24/3, 2020 at 16:9

@Ahead No, it takes the glob.GLOBSTAR flag – Pentecostal 24/3, 2020 at 17:31

@Ahead - Recursive example starting at path would use ** like: glob.glob("**/*.{md,ini}", root_dir=path, flags=glob.GLOBSTAR|glob.BRACE)) – Ceram 8/10, 2022 at 16:30

T

14

Same answer as @BPL (which is computationally efficient) but which can handle any glob pattern rather than extension:

import os
from fnmatch import fnmatch

folder = "path/to/folder/"
patterns = ("*.txt", "*.md", "*.markdown")

files = [f.path for f in os.scandir(folder) if any(fnmatch(f, p) for p in patterns)]

This solution is both efficient and convenient. It also closely matches the behavior of glob (see the documentation).

Note that this is simpler with the built-in package pathlib:

from pathlib import Path

folder = Path("/path/to/folder")
patterns = ("*.txt", "*.md", "*.markdown")

files = [f for f in folder.iterdir() if any(f.match(p) for p in patterns)]

Tenne answered 1/4, 2021 at 13:51 Comment(6)

Nice solution, thank you! Shouldn't fnmatch(f, p) actually be fnmatch(f.name, p) - as f is an nt.DirEntry object which cannot be tested by fnmatch. – Agosto 14/5, 2021 at 9:17

Thanks, actually this works without the .name so I guess a DirEntry can be tested with fnmatch. – Tenne 14/5, 2021 at 10:3

Interesting. Mine threw an error. Anyhow - thanks! – Agosto 14/5, 2021 at 10:5

I’m using Python 3.9, maybe this was fixed in an earlier version. – Tenne 14/5, 2021 at 10:16

Fair play. The system on which I'm using this is still on 3.5. (Ya, I know ... ) – Agosto 14/5, 2021 at 10:21

You could just directly write files = [f for f in folder.iterdir() if f.suffix.casefold() in [".txt", ".md", ".markdown"]] – Alvinia 28/11, 2022 at 21:55

F

9

Here is one-line list-comprehension variant of Pat's answer (which also includes that you wanted to glob in a specific project directory):

import os, glob
exts = ['*.txt', '*.mdown', '*.markdown']
files = [f for ext in exts for f in glob.glob(os.path.join(project_dir, ext))]

You loop over the extensions (for ext in exts), and then for each extension you take each file matching the glob pattern (for f in glob.glob(os.path.join(project_dir, ext)).

This solution is short, and without any unnecessary for-loops, nested list-comprehensions, or functions to clutter the code. Just pure, expressive, pythonic Zen.

This solution allows you to have a custom list of exts that can be changed without having to update your code. (This is always a good practice!)

The list-comprehension is the same used in Laurent's solution (which I've voted for). But I would argue that it is usually unnecessary to factor out a single line to a separate function, which is why I'm providing this as an alternative solution.

Bonus:

If you need to search not just a single directory, but also all sub-directories, you can pass recursive=True and use the multi-directory glob symbol ** ¹:

files = [f for ext in exts 
         for f in glob.glob(os.path.join(project_dir, '**', ext), recursive=True)]

This will invoke glob.glob('<project_dir>/**/*.txt', recursive=True) and so on for each extension.

¹ Technically, the ** glob symbol simply matches one or more characters including forward-slash / (unlike the singular * glob symbol). In practice, you just need to remember that as long as you surround ** with forward slashes (path separators), it matches zero or more directories.

Fenestrated answered 9/5, 2018 at 17:13 Comment(0)

T

9

Python 3

We can use pathlib; .glob still doesn't support globbing multiple arguments or within braces (as in POSIX shells) but we can easily filter the result.

For example, where you might ideally like to do:

# NOT VALID
Path(config_dir).glob("*.{ini,toml}")
# NOR IS
Path(config_dir).glob("*.ini", "*.toml")

you can do:

filter(lambda p: p.suffix in {".ini", ".toml"}, Path(config_dir).glob("*"))

which isn't too much worse.

Topology answered 14/11, 2020 at 17:34 Comment(2)

is or also dead in python glob? /**/(*.txt|*.jpg) ? – Kata 7/1, 2021 at 18:0

Definitely the best answer from the lot. If anyone wanted to iterate over a very large number of files, calling glob multiple times as in the other answers is extremely suboptimal. This retrieves files only once and makes do for pattern matching (although {} should really be included). Thanks! – Maggard 25/10, 2023 at 7:35

B

6

A one-liner, Just for the hell of it..

folder = "C:\\multi_pattern_glob_one_liner"
files = [item for sublist in [glob.glob(folder + ext) for ext in ["/*.txt", "/*.bat"]] for item in sublist]

output:

['C:\\multi_pattern_glob_one_liner\\dummy_txt.txt', 'C:\\multi_pattern_glob_one_liner\\dummy_bat.bat']

Blanche answered 22/7, 2017 at 15:7 Comment(0)

F

5

files = glob.glob('*.txt')
files.extend(glob.glob('*.dat'))

Fiona answered 12/6, 2018 at 17:33 Comment(1)

Good answers also provide some explanation of code and perhaps even some of your reasoning behind the code. – Secede 12/6, 2018 at 17:43

O

5

By the results I've obtained from empirical tests, it turned out that glob.glob isn't the better way to filter out files by their extensions. Some of the reason are:

The globbing "language" does not allows perfect specification of multiple extension.
The former point results in obtaining incorrect results depending on file extensions.
The globbing method is empirically proven to be slower than most other methods.
Even if it's strange even other filesystems objects can have "extensions", folders too.

I've tested (for correcteness and efficiency in time) the following 4 different methods to filter out files by extensions and puts them in a list:

from glob import glob, iglob
from re import compile, findall
from os import walk


def glob_with_storage(args):

    elements = ''.join([f'[{i}]' for i in args.extensions])
    globs = f'{args.target}/**/*{elements}'
    results = glob(globs, recursive=True)

    return results


def glob_with_iteration(args):

    elements = ''.join([f'[{i}]' for i in args.extensions])
    globs = f'{args.target}/**/*{elements}'
    results = [i for i in iglob(globs, recursive=True)]

    return results


def walk_with_suffixes(args):

    results = []
    for r, d, f in walk(args.target):
        for ff in f:
            for e in args.extensions:
                if ff.endswith(e):
                    results.append(path_join(r,ff))
                    break
    return results


def walk_with_regs(args):

    reg = compile('|'.join([f'{i}$' for i in args.extensions]))

    results = []
    for r, d, f in walk(args.target):
        for ff in f:
            if len(findall(reg,ff)):
                results.append(path_join(r, ff))

    return results

By running the code above on my laptop I obtained the following auto-explicative results.

Elapsed time for '7 times glob_with_storage()':  0.365023 seconds.
mean   : 0.05214614
median : 0.051861
stdev  : 0.001492152
min    : 0.050864
max    : 0.054853

Elapsed time for '7 times glob_with_iteration()':  0.360037 seconds.
mean   : 0.05143386
median : 0.050864
stdev  : 0.0007847381
min    : 0.050864
max    : 0.052859

Elapsed time for '7 times walk_with_suffixes()':  0.26529 seconds.
mean   : 0.03789857
median : 0.037899
stdev  : 0.0005759071
min    : 0.036901
max    : 0.038896

Elapsed time for '7 times walk_with_regs()':  0.290223 seconds.
mean   : 0.04146043
median : 0.040891
stdev  : 0.0007846776
min    : 0.04089
max    : 0.042885

Results sizes:
0 2451
1 2451
2 2446
3 2446

Differences between glob() and walk():
0 E:\x\y\z\venv\lib\python3.7\site-packages\Cython\Includes\numpy
1 E:\x\y\z\venv\lib\python3.7\site-packages\Cython\Utility\CppSupport.cpp
2 E:\x\y\z\venv\lib\python3.7\site-packages\future\moves\xmlrpc
3 E:\x\y\z\venv\lib\python3.7\site-packages\Cython\Includes\libcpp
4 E:\x\y\z\venv\lib\python3.7\site-packages\future\backports\xmlrpc

Elapsed time for 'main':  1.317424 seconds.

The fastest way to filter out files by extensions, happens even to be the ugliest one. Which is, nested for loops and string comparison using the endswith() method.

Moreover, as you can see, the globbing algorithms (with the pattern E:\x\y\z\**/*[py][pyc]) even with only 2 extension given (py and pyc) returns also incorrect results.

Orientation answered 16/6, 2019 at 12:51 Comment(1)

The walk versions appear to be doing some caching because they get much faster after the first run. Even so, the walk_with_suffixes version was still the fastest in my testing when comparing first runs. Is there a way to clear the cache so repeated testing like you did isn't skewed? – Hundredweight 28/8, 2021 at 17:44

S

4

I have released Formic which implements multiple includes in a similar way to Apache Ant's FileSet and Globs.

The search can be implemented:

import formic
patterns = ["*.txt", "*.markdown", "*.mdown"]
fileset = formic.FileSet(directory=projectDir, include=patterns)
for file_name in fileset.qualified_files():
    # Do something with file_name

Because the full Ant glob is implemented, you can include different directories with each pattern, so you could choose only those .txt files in one subdirectory, and the .markdown in another, for example:

patterns = [ "/unformatted/**/*.txt", "/formatted/**/*.mdown" ]

I hope this helps.

Shampoo answered 15/5, 2012 at 9:30 Comment(0)

L

4

After coming here for help, I made my own solution and wanted to share it. It's based on user2363986's answer, but I think this is more scalable. Meaning, that if you have 1000 extensions, the code will still look somewhat elegant.

from glob import glob

directoryPath  = "C:\\temp\\*." 
fileExtensions = [ "jpg", "jpeg", "png", "bmp", "gif" ]
listOfFiles    = []

for extension in fileExtensions:
    listOfFiles.extend( glob( directoryPath + extension ))

for file in listOfFiles:
    print(file)   # Or do other stuff

Logistic answered 10/2, 2015 at 3:13 Comment(2)

Doesn't work for me. I use directoryPath = "/Users/bla/bla/images_dir*." – Torietorii 13/12, 2018 at 14:12

I would need more info to debug this for you... Are you getting an exception? Also, if you're on Windows, that path doesn't look like it would work (missing drive letter). – Logistic 9/1, 2019 at 0:39

S

4

This is a Python 3.4+ pathlib solution:

exts = ".pdf", ".doc", ".xls", ".csv", ".ppt"
filelist = (str(i) for i in map(pathlib.Path, os.listdir(src)) if i.suffix.lower() in exts and not i.stem.startswith("~"))

Also it ignores all file names starting with ~.

Stratovision answered 9/10, 2015 at 7:42 Comment(0)

C

3

Not glob, but here's another way using a list comprehension:

extensions = 'txt mdown markdown'.split()
projectFiles = [f for f in os.listdir(projectDir) 
                  if os.path.splitext(f)[1][1:] in extensions]

Casiecasilda answered 6/12, 2012 at 3:36 Comment(0)

P

3

The following function _glob globs for multiple file extensions.

import glob
import os
def _glob(path, *exts):
    """Glob for multiple file extensions

    Parameters
    ----------
    path : str
        A file name without extension, or directory name
    exts : tuple
        File extensions to glob for

    Returns
    -------
    files : list
        list of files matching extensions in exts in path

    """
    path = os.path.join(path, "*") if os.path.isdir(path) else path + "*"
    return [f for files in [glob.glob(path + ext) for ext in exts] for f in files]

files = _glob(projectDir, ".txt", ".mdown", ".markdown")

Polyandrous answered 15/1, 2013 at 15:18 Comment(0)

R

2

You can try to make a manual list comparing the extension of existing with those you require.

ext_list = ['gif','jpg','jpeg','png'];
file_list = []
for file in glob.glob('*.*'):
  if file.rsplit('.',1)[1] in ext_list :
    file_list.append(file)

Raila answered 19/1, 2012 at 5:46 Comment(0)

B

2

From previous answer

glob('*.jpg') + glob('*.png')

Here is a shorter one,

from glob import glob
extensions = ['jpg', 'png'] # to find these filename extensions

# Method 1: loop one by one and extend to the output list
output = []
[output.extend(glob(f'*.{name}')) for name in extensions]
print(output)

# Method 2: even shorter
# loop filename extension to glob() it and flatten it to a list
output = [p for p2 in [glob(f'*.{name}') for name in extensions] for p in p2]
print(output)

Brachycephalic answered 28/8, 2020 at 2:14 Comment(1)

Adding an explanation to this code sample would help improve this answer. – Muraida 28/8, 2020 at 6:37

M

1

import os    
import glob
import operator
from functools import reduce

types = ('*.jpg', '*.png', '*.jpeg')
lazy_paths = (glob.glob(os.path.join('my_path', t)) for t in types)
paths = reduce(operator.add, lazy_paths, [])

https://docs.python.org/3.5/library/functools.html#functools.reduce https://docs.python.org/3.5/library/operator.html#operator.add

Mainland answered 24/4, 2017 at 4:27 Comment(0)

P

1

To glob multiple file types, you need to call glob() function several times in a loop. Since this function returns a list, you need to concatenate the lists.

For instance, this function do the job:

import glob
import os


def glob_filetypes(root_dir, *patterns):
    return [path
            for pattern in patterns
            for path in glob.glob(os.path.join(root_dir, pattern))]

Simple usage:

project_dir = "path/to/project/dir"
for path in sorted(glob_filetypes(project_dir, '*.txt', '*.mdown', '*.markdown')):
    print(path)

You can also use glob.iglob() to have an iterator:

Return an iterator which yields the same values as glob() without actually storing them all simultaneously.

def iglob_filetypes(root_dir, *patterns):
    return (path
            for pattern in patterns
            for path in glob.iglob(os.path.join(root_dir, pattern)))

Protozoan answered 13/9, 2017 at 13:8 Comment(0)

S

1

One glob, many extensions... but imperfect solution (might match other files).

filetypes = ['tif', 'jpg']

filetypes = zip(*[list(ft) for ft in filetypes])
filetypes = ["".join(ch) for ch in filetypes]
filetypes = ["[%s]" % ch for ch in filetypes]
filetypes = "".join(filetypes) + "*"
print(filetypes)
# => [tj][ip][fg]*

glob.glob("/path/to/*.%s" % filetypes)

Stillage answered 11/10, 2017 at 21:3 Comment(0)

I

1

I had the same issue and this is what I came up with

import os, sys, re

#without glob

src_dir = '/mnt/mypics/'
src_pics = []
ext = re.compile('.*\.(|{}|)$'.format('|'.join(['png', 'jpeg', 'jpg']).encode('utf-8')))
for root, dirnames, filenames in os.walk(src_dir):
  for filename in filter(lambda name:ext.search(name),filenames):
    src_pics.append(os.path.join(root, filename))

Instil answered 8/5, 2018 at 13:17 Comment(0)

V

1

Use a list of extension and iterate through

from os.path import join
from glob import glob

files = []
extensions = ['*.gif', '*.png', '*.jpg']
for ext in extensions:
   files.extend(glob(join("path/to/dir", ext)))

print(files)

Villiform answered 26/7, 2018 at 11:46 Comment(0)

B

0

You could use filter:

import os
import glob

projectFiles = filter(
    lambda x: os.path.splitext(x)[1] in [".txt", ".mdown", ".markdown"]
    glob.glob(os.path.join(projectDir, "*"))
)

Bernard answered 28/5, 2015 at 21:12 Comment(0)

E

0

You could also use reduce() like so:

import glob
file_types = ['*.txt', '*.mdown', '*.markdown']
project_files = reduce(lambda list1, list2: list1 + list2, (glob.glob(t) for t in file_types))

this creates a list from glob.glob() for each pattern and reduces them to a single list.

Emelineemelita answered 7/11, 2016 at 19:35 Comment(0)

C

0

Yet another solution (use glob to get paths using multiple match patterns and combine all paths into a single list using reduce and add):

import functools, glob, operator
paths = functools.reduce(operator.add, [glob.glob(pattern) for pattern in [
    "path1/*.ext1",
    "path2/*.ext2"]])

Clap answered 10/11, 2018 at 16:7 Comment(0)

L

0

If you use pathlib try this:

import pathlib

extensions = ['.py', '.txt']
root_dir = './test/'

files = filter(lambda p: p.suffix in extensions, pathlib.Path(root_dir).glob('**/*'))

print(list(files))

Lianneliao answered 2/5, 2019 at 10:4 Comment(0)

S

0

This worked for me!

split('.')[-1]

above code separate the filename suffix (*.xxx) so it can help you

    for filename in glob.glob(folder + '*.*'):
        print(folder+filename)
        if  filename.split('.')[-1] != 'tif' and \
            filename.split('.')[-1] != 'tiff' and \
            filename.split('.')[-1] != 'bmp' and \
            filename.split('.')[-1] != 'jpg' and \
            filename.split('.')[-1] != 'jpeg' and \
            filename.split('.')[-1] != 'png':
                continue
        # Your code

Shenashenan answered 24/2, 2021 at 11:39 Comment(0)

B

0

Easiest way is using itertools.chain

from pathlib import Path
import itertools

cwd = Path.cwd()

for file in itertools.chain(
    cwd.rglob("*.txt"),
    cwd.rglob("*.md"),
):
    print(file.name)

Babbie answered 25/8, 2022 at 11:44 Comment(0)

J

0

You can use this:

project_files = []
file_extensions = ['txt','mdown','markdown']
for file_extension in file_extensions:
    project_files.extend(glob.glob(projectDir  + '*.' + file_extension))

Julissajulita answered 20/1, 2023 at 15:7 Comment(0)

P

-1

Maybe I'm missing something but if it's just plain glob maybe you could do something like this?

projectFiles = glob.glob(os.path.join(projectDir, '*.{txt,mdown,markdown}'))

Premundane answered 25/10, 2022 at 18:23 Comment(1)

Have you tried this? Python glob does not support parentheses. – Shelburne 28/3 at 12:16

M

-2

This Should Work:

import glob
extensions = ('*.txt', '*.mdown', '*.markdown')
for i in extensions:
    for files in glob.glob(i):
        print (files)

Magisterial answered 5/11, 2014 at 12:45 Comment(0)

S

-2

For example:

import glob
lst_img = []
base_dir = '/home/xy/img/'

# get all the jpg file in base_dir 
lst_img += glob.glob(base_dir + '*.jpg')
print lst_img
# ['/home/xy/img/2.jpg', '/home/xy/img/1.jpg']

# append all the png file in base_dir to lst_img
lst_img += glob.glob(base_dir + '*.png')
print lst_img
# ['/home/xy/img/2.jpg', '/home/xy/img/1.jpg', '/home/xy/img/3.png']

A function:

import glob
def get_files(base_dir='/home/xy/img/', lst_extension=['*.jpg', '*.png']):
    """
    :param base_dir:base directory
    :param lst_extension:lst_extension: list like ['*.jpg', '*.png', ...]
    :return:file lists like ['/home/xy/img/2.jpg','/home/xy/img/3.png']
    """
    lst_files = []
    for ext in lst_extension:
        lst_files += glob.glob(base_dir+ext)
    return lst_files

Studer answered 26/7, 2018 at 11:23 Comment(0)

G

-2

In one line :

IMG_EXTS = (".jpg", ".jpeg", ".jpe", ".jfif", ".jfi", ".jif",".JPG")

directory = './'

files = [ file for file in glob.glob(directory+'/*') if file.endswith(IMG_EXTS)]

Gyrostabilizer answered 30/11, 2022 at 11:38 Comment(0)

G

-2

glob.glob('Folder/*[.png,jpg,jpeg,pdf]')

Worked for me to search for images and pdf

Glottalized answered 8/1 at 3:52 Comment(2)

It seems you are missing dots for the last three filetypes – Purgative 8/1 at 10:2

From the documentation: "[seq] matches any character in seq". So "*[.png,jpg,jpeg,pdf]" would for example match "foo,f". – Shelburne 28/3 at 12:14

A

-2

.glob('*[.jpg][.jpeg][.png][.gif]'):

Amoreta answered 24/2 at 13:9 Comment(1)

Thank you for your interest in contributing to the Stack Overflow community. This question already has quite a few answers—including one that has been extensively validated by the community. Are you certain your approach hasn’t been given previously? If so, it would be useful to explain how your approach is different, under what circumstances your approach might be preferred, and/or why you think the previous answers aren’t sufficient. Can you kindly edit your answer to offer an explanation? – Collencollenchyma 25/2 at 0:10

G

-3

import glob
import pandas as pd

df1 = pd.DataFrame(columns=['A'])
for i in glob.glob('C:\dir\path\*.txt'):
    df1 = df1.append({'A': i}, ignore_index=True)
for i in glob.glob('C:\dir\path\*.mdown'):
    df1 = df1.append({'A': i}, ignore_index=True)
for i in glob.glob('C:\dir\path\*.markdown):
    df1 = df1.append({'A': i}, ignore_index=True)

Gilead answered 20/1, 2020 at 17:47 Comment(2)

Hi Sway Wu, welcome. Please consider adding an explanation. – Iapetus 20/1, 2020 at 18:9

"If the only tool you have is Pandas, you tend to see every problem as a DataFrame." - Abraham Maslow – Shelburne 28/3 at 12:20

P

-4

import os
import glob

projectFiles = [i for i in glob.glob(os.path.join(projectDir,"*")) if os.path.splitext(i)[-1].lower() in ['.txt','.markdown','.mdown']]

os.path.splitext will return filename & .extension

filename, .extension = os.path.splitext('filename.extension')

.lower() will convert a string into lowercase

Pteridology answered 6/4, 2021 at 11:28 Comment(0)

P

-7

this worked for me:

import glob
images = glob.glob('*.JPG' or '*.jpg' or '*.png')

Prakash answered 20/4, 2018 at 15:44 Comment(1)

This cannot possibly work as you intend it to. The or operator returns the first "non-falsy" value, so in your case: *.JPG. This turns your call into glob.glob('*.JPG'), meaning it will only return *.JPG files, completely forgetting about the other extensions. – Ileneileo 17/8, 2018 at 8:33

Python 3

Recommended topics

Hot tags