Is there a way to move many files quickly in Python?
Asked Answered
S

3

6

I have a little script that moves files around in my photo collection, but it runs a bit slow.

I think it's because I'm doing one file move at a time. I'm guessing I can speed this up if I do all file moves from one dir to another at the same time. Is there a way to do that?

If that's not the reason for my slowness, how else can I speed this up?

Update:

I don't think my problem is being understood. Perhaps, listing my source code will help explain:

# ORF is the file extension of the files I want to move;
# These files live in dirs shared by JPEG files,
# which I do not want to move.
import os
import re
from glob import glob
import shutil

DIGITAL_NEGATIVES_DIR = ...
DATE_PATTERN = re.compile('\d{4}-\d\d-\d\d')

# Move a single ORF.
def move_orf(src):
    dir, fn = os.path.split(src)
    shutil.move(src, os.path.join('raw', dir))

# Move all ORFs in a single directory.
def move_orfs_from_dir(src):
    orfs = glob(os.path.join(src, '*.ORF'))
    if not orfs:
        return
    os.mkdir(os.path.join('raw', src))
    print 'Moving %3d ORF files from %s to raw dir.' % (len(orfs), src)
    for orf in orfs:
        move_orf(orf)

# Scan for dirs that contain ORFs that need to be moved, and move them.
def main():
    os.chdir(DIGITAL_NEGATIVES_DIR)
    src_dirs = filter(DATE_PATTERN.match, os.listdir(os.curdir))
    for dir in src_dirs:
        move_orfs_from_dir(dir)

if __name__ == '__main__':
    main()
Subhead answered 9/10, 2010 at 7:31 Comment(5)
Could you provide your script (or just the core of it)? What are you using to do the copy or move?Trestle
I don't want to move entire directories, although a single dir may contain many files that should be moved.Subhead
I'm using shutil.move to move individual filesSubhead
Hmm, should be pretty fast then. Are you using os.walk to traverse the directory to find the files?Trestle
@Trestle I'm pretty sure I am (don't have access to the file atm).Subhead
P
4

What platform are you on? And does it really have to be Python? If not, you can simply use system tools like mv (*nix) , or move (windows).

$ stat -c "%s" file
382849574

$ time python -c 'import shutil;shutil.move("file","/tmp")'

real    0m29.698s
user    0m0.349s 
sys     0m1.862s 

$ time mv file /tmp

real    0m29.149s
user    0m0.011s 
sys     0m1.607s 

$ time python -c 'import shutil;shutil.move("file","/tmp")'

real    0m30.349s
user    0m0.349s 
sys     0m2.015s 

$ time mv file /tmp

real    0m28.292s
user    0m0.015s 
sys     0m1.702s 

$ cat test.py
#!/usr/bin/env python
import shutil
shutil.move("file","/tmp")
shutil.move("/tmp/file",".")

$ cat test.sh
#!/bin/bash
mv file /tmp
mv /tmp/file .

# time python test.py

real    1m1.175s
user    0m0.641s
sys     0m4.110s

$ time bash test.sh

real    1m1.040s
user    0m0.026s
sys     0m3.242s

$ time python test.py

real    1m3.348s
user    0m0.659s
sys     0m4.024s

$ time bash test.sh

real    1m1.740s
user    0m0.017s
sys     0m3.276s
Pralltriller answered 9/10, 2010 at 8:25 Comment(5)
There's no particular reason that would be any faster than doing it in Python; it's generally going to be I/O-bound.Alsace
In all the tests on my linux box, using system mv is faster than Python shutil.move.Pralltriller
@user131527: It sounds like he has a script that's locating particular files and moving them. In that case (since he's already in python) shutil.move(stuff) is cleaner & safer to write than os.system('mv stuff'); Once you're already running python interpreter, the difference is moot since shutil.move just calls the system's move.Trestle
That's why i ask whether using Python is a definite must, right? If not, using the shell's mv command instead of Python.Pralltriller
I'm sure this can be done in shell (I assume it's Turing complete), but I see no reason why this should be slow in Python.Subhead
S
3

Edit:

In my own state of confusion (which JoshD helpfully remedied), I forgot that shutil.move accepts directories, so you can (and should) just use that to move your directory as a batch.

Subfusc answered 9/10, 2010 at 7:33 Comment(3)
I think he wants to move rather than copy... maybe. In that case a simple move is much faster than a copy then delete.Trestle
Technically 'copy' should be faster than 'move'. Since move does 'copy+delete'. One way to speed your program.Banditry
@movieyoda: I take it you've not moved 20GB directories then copied the same 20GB directory, have you? Move (on the same disk) is simply a rename.Trestle
T
2

If you just want to move the directory, you can use shutil.move. It'll be pretty freakin' quick (if it's on the same filesystem) because it's just a rename operation.

Trestle answered 9/10, 2010 at 7:41 Comment(3)
On the same filesystem, to be exact.Anaconda
Oh and by the way, shutil.move does try: os.rename(...) except OSError: ...copy and delete... automatically, so there's no reason for using os.rename in 99% of the cases.Anaconda
@AndiDog: Thanks for clarifying those details. I'll update the answer with the more accurate information.Trestle

© 2022 - 2024 — McMap. All rights reserved.