Getting glob to follow symlinks in Python
Asked Answered
D

3

24

Suppose I have a subdirectory of symlinks that looks like the following:

subdir/
    folder/
        readme.txt
    symlink/ => ../hidden/
hidden/
    readme.txt

If I run the following code:

>>> from pathlib import Path
>>> list(Path('./subdir/').glob('**/readme.txt'))

I would expect the outcome to be:

subdir/folder/readme.txt
subdir/symlink/readme.txt

But the actual result is:

subdir/folder/readme.txt

I found out that this is because (for some undocumented reason) the ** operator doesn't follow symlinks.

Is there a way to change this configuration pragmatically?

Diaspore answered 2/10, 2017 at 16:31 Comment(3)
That's strange, since there's an open issue asking for glob to optionally not follow symlinks.Interject
This behavior seems to have been caused as a side effect of fixing this issue: bugs.python.org/issue26012. The method _iterate_directories() in class _RecursiveWildcardSelector of pathlib.py explicitly ignores symlinks.Aldosterone
Probably of interest: github.com/python/cpython/issues/77609#issuecomment-1567306837Weatherboarding
F
18

pathlib.glob also doesn't work for me with ** and symlinks. I've found related issue https://bugs.python.org/issue33428.

As an alternative for Python3 you could use glob.glob with ** and recursive=True option (see details https://docs.python.org/3/library/glob.html)

In [67]: from glob import glob
In [71]: list(glob("./**/readme.txt", recursive=True))
Out[71]:
['./hidden/readme.txt',
 './subdir/folder/readme.txt',
 './subdir/symlink/readme.txt']

In [73]: list(glob("./**/readme.txt", recursive=False))
Out[73]: ['./hidden/readme.txt']

Compare to:

In [72]: list(Path('.').glob('**/readme.txt'))
Out[72]: [PosixPath('hidden/readme.txt'), PosixPath('subdir/folder/readme.txt')]
Fusion answered 21/8, 2019 at 15:10 Comment(0)
I
1

I've never used pathlib before, so you may extend this solution to take advantage of some of its features, but I got this to work using glob only.

from glob import glob
list(glob('./subdir/*/readme.txt'))

Output:

['./subdir/folder/readme.txt', './subdir/symlink/readme.txt']

If you're set on using glob with more than one depth of subdirectory, the hackish solution is to include variations with extra */ (e.g. ./subdir/*/*/*/readme.txt) up to some arbitrary depth, and concatenate the results from each variation.

The more appropriate way to do what you want would be to write a custom function that has the behavior you want (searches through symlinks to arbitrary depth), and handles the case of circular paths in the way you want. See this question for tips on doing this with os.walk (remember to set followlinks=True).

Interject answered 2/10, 2017 at 16:52 Comment(5)
Path.glob and glob use the same functionality - the issue I have found is that glob doesn't like following symlinks when using **.Diaspore
Is there a reason you can't use * instead of **, since your directory structure has a specific depth? Interesting to note that glob in javascript has the same behavior for **.Interject
I am guessing the reason for this behavior is to avoid issues with infinite depth in the case of symlinks that create directory loops.Interject
I'm trying to write a solution abstract enough that it can be used in a few case scenarios. There will be instances where the readme.txt file may be burried deepDiaspore
Updated solution to reflect your need for deeper paths.Interject
F
1

python 3.6 works for me with a call to rglob,

import pathspec

p = pathspec.Path("./subdir").rglob("readme.txt")
Forethoughtful answered 14/7, 2022 at 7:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.