I am using os.walk
to build a map of a data-store (this map is used later in the tool I am building)
This is the code I currently use:
def find_children(tickstore):
children = []
dir_list = os.walk(tickstore)
for i in dir_list:
children.append(i[0])
return children
I have done some analysis on it:
dir_list = os.walk(tickstore)
runs instantly, if I do nothing with dir_list
then this function completes instantly.
It is iterating over dir_list
that takes a long time, even if I don't append
anything, just iterating over it is what takes the time.
Tickstore
is a big datastore, with ~10,000 directories.
Currently it takes approx 35minutes to complete this function.
Is there any way to speed it up?
I've looked at alternatives to os.walk
but none of them seemed to provide much of an advantage in terms of speed.
return [dir for dir, _, _ in os.walk(tickstore)]
might be a bit more efficient, but hard drive access is relatively slow in general. – Acceptantos.walk
? Are you then going back to get the files for each directory (which you've just thrown away)? This is starting to seem a lot like an XY problem... – Acceptantdir_list
is a generator, not alist
(c: So, it'll only access the drive when iterated over. – Anyaos.walk
creates a list of directories in which files can be stored. The structure of this datastore also allows me to filter, so the whole directory name is needed. The next function I call on children uses the directories along withos.listdir
to filter out invalid files within those directories. – Equestrienneos.walk()
gets your the file list already. Don't discard the info, use it. – Cheekos.walk
returns three things: the directory you're currently in; a list of the sub-directories in that directory; and a list of the files in that directory. It is also a generator, so can be used more efficiently if you don't need everything at once. It seems certain that there is a smarter way to do what you're currently doing that will give both performance and readability/maintainability benefits. – Acceptant