Make pathlib.glob() and pathlib.rglob() case-insensitive for platform-agnostic application
Asked Answered
F

4

6

I am using pathlib.glob() and pathlib.rglob() to matching files from a directory and its subdirectories, respectively. Target files both are both lower case .txt and upper case .TXT files. According file paths were read from the filesystem as follows:

import pathlib

directory = pathlib.Path()
files_to_create = ['a.txt', 'b.TXT']
suffixes_to_test = ['*.txt', '*.TXT']

for filename in files_to_create:
    filepath = directory / filename
    filepath.touch()
    
for suffix in suffixes_to_test:
    files = [fp.relative_to(directory) for fp in directory.glob(suffix)]
    print(f'{suffix}: {files}')

The majority of the code base was developed on a Windows 10 machine (running Python 3.7.4) and was now moved to macOS Monterey 12.0.1 (running Python 3.10.1).

On Windows both files a.txt and b.TXT are matching the patterns:

*.txt: [WindowsPath('a.txt'), WindowsPath('b.TXT')]
*.TXT: [WindowsPath('a.txt'), WindowsPath('b.TXT')]

In contrast, macOS only one file matches each pattern:

*.txt: [PosixPath('a.txt')]
*.TXT: [PosixPath('b.TXT')]

Therefore, I assume that the macOS file system might be case-sensitive, whereas the Windows one is not. According to Apple's User Guide the macOS file system used should not be case-sensitive by default but can be configured as such. Something similar might apply for Linux or Unix file systems as discussed here and here.

Despite the reason for this differing behavior, I need to find a platform-agnostic way to get both capital TXT and lower case txt files. A rather naive workaround could be something like this:

results = set([fp.relative_to(directory) for suffix in suffixes_to_test for fp in directory.glob(suffix)])

Which gives the desired output on both macOS and Windows:

{PosixPath('b.TXT'), PosixPath('a.txt')}

However, is there a more elegant way? I could not find any option like ignore_case in pathlib's documentation.

Forklift answered 14/3, 2022 at 15:53 Comment(0)
C
5

What about something like:

suffix = '*.[tT][xX][tT]'
files = [fp.relative_to(directory) for fp in directory.glob(suffix)]

It is not so generalizable for a "case-insensitive glob", but it works well for limited and specific use-case like your glob of a specific extension.

Curiosa answered 9/1, 2023 at 20:42 Comment(0)
B
1

You could also use the syntax:

import pathlib

directory = pathlib.Path()
files_to_create = ['a.txt', 'b.TXT']

for filename.lower() in files_to_create:
    filepath = directory / filename
    filepath.touch()

files = [fp.relative_to(directory) for fp in directory.glob("*.[tT][xX][tT]")]

This would match all case combinations of txt.

Baughman answered 6/1, 2023 at 14:24 Comment(0)
L
0

If you are not wanting case sensitive then you could set the case to lower or upper.

import pathlib

directory = pathlib.Path()
files_to_create = ['a.txt', 'b.TXT']
suffixes_to_test = ['*.txt']

for filename.lower() in files_to_create:
    filepath = directory / filename
    filepath.touch()

for suffix.lower() in suffixes_to_test:
    files = [fp.relative_to(directory) for fp in directory.glob(suffix)]
    print(f'{suffix}: {files}')

This will give different output to your example of desired output but also might be what you are asking?

You might also draw inspiration from answers here: Ignore case in glob() on Linux

If you do want to be case sensitive then you might adapt those answers and use the fnmatch.fnmatchcase method to make the Windows behaviour case sensitive.

Un-tried but probably something like;

import pathlib

directory = pathlib.Path()
files_to_create = ['a.txt', 'b.TXT']
suffixes_to_test = ['*.txt', '*.TXT']

for filename in files_to_create:
    filepath = directory / filename
    filepath.touch()
    
for suffix in suffixes_to_test:
    files = [fp.relative_to(directory) for fp in directory.glob('*') if fnmatch.fnmatchcase(f'{fp}', suffix)]
    print(f'{suffix}: {files}')
Lubricious answered 30/3, 2022 at 13:20 Comment(2)
However, this solution will only match either lower or upper case. If your extensions are e.g. .Txt it will not work. So it's not really case-insensitive.Lonni
That's true. But does address the OPs target. If the solution should address any permutation of cases then a different approach should probably be taken.Lubricious
P
0

For the case where someone just wants to find case insensitive matches of arbitrary file suffix, in a unicode safe way, the suffix can be tested directly:

from pathlib import Path

search_dir = Path("your/directory")
extensions = [".png", ".jpg"]

filepaths = []
for ext in extensions:
    ext = ext.lower()
    filepaths.extend([
        fp.resolve()  # make sure it's absolute
        for fp in search_dir.glob('*')  # list all
        if fp.is_file() and fp.suffix.lower() == ext
    ])
Possible answered 28/8 at 4:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.