Using endswith with case insensitivity in python
Asked Answered
O

2

23

I have a list of file extensions and I have to write if conditions. Something like

ext = (".dae", ".xml", ".blend", ".bvh", ".3ds", ".ase",
           ".obj", ".ply", ".dxf", ".ifc", ".nff", ".smd",
           ".vta", ".mdl", ".md2", ".md3",
           ".pk3", ".mdc", ".x",
           ".q3o", ".q3s", ".raw",
           ".ac", ".dxf", ".irrmesh",
           ".irr", ".off", ".ter",
           ".mdl", ".hmp", ".mesh.xml",
           ".skeleton.xml", ".material", ".ms3dv",
           ".lwo", ".lws", ".lxo",
           ".csm", ".cob", ".scn",
           ".xgl", ".zgl")
for folder, subfolders, filename in os.walk(directory):
    if any([filename.endswith(tuple(ext)) for filename in filenames]):

I realized that endswith is case sensitive. How I could treat, for instance, ".xml" and ".XML" as the same extensions?

Orazio answered 11/8, 2017 at 14:25 Comment(0)
D
53

Simply call lower to make the string lowercase before calling endswith:

ext = (".dae", ".xml", ".blend", ".bvh", ".3ds", ".ase",
           ".obj", ".ply", ".dxf", ".ifc", ".nff", ".smd",
           ".vta", ".mdl", ".md2", ".md3"
           ".pk3", ".mdc", ".x"
           ".q3o", ".q3s", ".raw"
           ".ac", ".dxf", ".irrmesh"
           ".irr", ".off", ".ter"
           ".mdl", ".hmp", ".mesh.xml"
           ".skeleton.xml", ".material", ".ms3dv"
           ".lwo", ".lws", ".lxo"
           ".csm", ".cob", ".scn"
           ".xgl", ".zgl")
for folder, subfolders, filename in os.walk(directory):
    if any([filename.lower().endswith(tuple(ext)) for filename in filenames]):
Dionne answered 11/8, 2017 at 14:26 Comment(1)
Don't forget to first ensure the variable is not None otherwise lower() will raise an exception.Deadpan
U
0

This is a really old answer but since Python 3.3, there exists the casefold() method which is more aggressive than lower() and is the more natural caseless string matching. So something like the following is an option:

filename.casefold().endswith(ext)

On a tangential note, any() short-circuits, meaning it stops at the first True, so it is much faster if any() is called on a generator expression instead of a list (especially if the matching string is towards the beginning of a long list) because with genexpr, we can stop the endswith check right away while with list, we have to perform the endswith check for every string before the any() evaluation.

So instead of

any([filename.lower().endswith(ext) for filename in filenames])
#   ^                                                        ^  <--- list

use

any(filename.lower().endswith(ext) for filename in filenames)
#  ^                                                        ^   <--- genexpr

Finally, since this question is tagged regex, here's a regex solution as well. Simply compile a pattern that ignores case and search whether the pattern matches.

import re
pat = re.compile(fr"({'|'.join(re.escape(e) for e in ext)})$", re.I)
for folder, subfolders, filenames in os.walk('.'):
    if any(pat.search(filename) for filename in filenames):
        # do something
Uterus answered 15/1 at 20:28 Comment(1)
It is important to remember that case insensitivity using str,casefold() is different to case insensitivity in regular expressions using re.I. One uses full case folding, the other simple casefolding. Results for some languages differ between the two behaviours. One of the gotchas in Python.Einberger

© 2022 - 2024 — McMap. All rights reserved.