Avoiding infinite recursion with os.walk
Asked Answered
H

2

6

I'm using os.walk with followlinks=True, but I hit a place where a symbolic link refers to it's own directory, causing an infinite loop. The culprit in this case is /usr/bin/X11 which list listed as follow :

lrwxrwxrwx 1 root root           1 Apr 24  2015 X11 -> .

Is there any way to avoid following links to either . or .. which I would assume, would cause similar problems? I think I could check this with os.readlink then compare against the current path. Is there any other solution for this?

Haslam answered 2/5, 2016 at 7:34 Comment(3)
What about links like a -> b and b -> a?Eustatius
Yes, this would probably also cause greater problems. Like maintaining a list of searched directories which gets large and ugly fastHaslam
@Eric: Why would that be ugly?Copyedit
C
8

There is no way to avoid storing a set of all the directories visited, if you want to avoid recursion. You do not need to use readlink, however, you can just store inodes. This avoids the problem of path canonicalization altogether.

import os
dirs = set()
for dirpath, dirnames, filenames in os.walk('.', followlinks=True):
    st = os.stat(dirpath)
    scandirs = []
    for dirname in dirnames:
        st = os.stat(os.path.join(dirpath, dirname))
        dirkey = st.st_dev, st.st_ino
        if dirkey not in dirs:
            dirs.add(dirkey)
            scandirs.append(dirname)
    dirnames[:] = scandirs
    print(dirpath)
Copyedit answered 2/5, 2016 at 7:57 Comment(12)
Ok, it doesn't have to be ugly =)Haslam
Isn't this risky if your symlinks cross filesystem boundaries? You could have different files with the same inode on two distinct filesystems, no?Precession
@gimboland: Look at the code: dirkey = st.st_dev, st.st_ino.Copyedit
Ah yes, sorry I missed that; nice one.Precession
What if I don't mind the same directory being included a number of times by means of symlinks yet would like to avoid recursion?Horsa
@Ivan: Then you don't need anything from this answer, you can just use ordinary os.walk() by itself.Copyedit
@DietrichEpp os.walk() documentation explicitly mentions it is vulnerable to recursive symlinks with followlinks=True.Horsa
@Ivan: The os.walk function does not make that easy. You'll need to only check the parent directories, which requires doing something like maintaining a separate stack of inodes (and figuring out how many you need to pop each time through the loop), maintaining a map from paths to inodes (and walking up the tree), or writing your own version of os.walk.Copyedit
@DietrichEpp I thought of converting full path of every item os.walk() finds to a list of inodes and discarding items that contain themselves in the path to themselves this way but quickly dismissed this idea realizing that this won't prevent os.walk() itself from keeping traversing a recursive path. Now the only idea that I have is giving os.walk up and implementing the whole thing manually with os.listdir().Horsa
@Ivan: Don't dismiss that idea so quickly, os.walk won't traverse a recursive path if you remove it. This is what the dirnames[:] = scandirs line does in the example above.Copyedit
@DietrichEpp Thanks. I'll try that. BTW why dirnames[:] = scandirs and not dirnames = scandirs?Horsa
This modifies the existing list.Copyedit
S
2

To completely avoid the problem of infinite recursion (with links pointing to where ever) you need to store the files and/or directories you already visited.

The people from pynotify module had the same issue and used the described method. The patch is in the link ;)

Simplicidentate answered 2/5, 2016 at 7:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.