How to use GNU parallel with find -exec?
Asked Answered
C

2

13

I want to unzip multiple files,

Using this answer, I found the following command.

find -name '*.zip' -exec sh -c 'unzip -d "${1%.*}" "$1"' _ {} \;

How do I use GNU Parallel with the above command to unzip multiple files?


Edit 1: As per questions by user Mark Setchell

Where are the files ?

All the zip files are generally in a single directory.

But, as per my assumption, the command finds all the files even if recursively/non-recursively according to the depth given in find command.

How are the files named?

abcd_sdfa_fasfasd_dasd14.zip

how do you normally unzip a single one?

unzip abcd_sdfa_fasfasd_dasd14.zip -d abcd_sdfa_fasfasd_dasd14

Criminology answered 24/12, 2019 at 10:40 Comment(1)
Where are the files - in a single directory or spread across a hierarchy of directories? How are the files named and how do you normally unzip a single one?Maleate
F
21

You can first use find with the -print0 option to NULL delimit files and then read back in GNU parallel with the NULL delimiter and apply the unzip

find . -type f -name '*.zip' -print0 | parallel -0 unzip -d {/.} {}

The part {/.} applies string substitution to get the basename of the file and removes the part preceding the . as seen from the GNU parallel documentation - See 7. Get basename, and remove last ({.}) or any ({:}) extension You can further set the number of parallel jobs that can be run with the -j flag. e.g. -j8, -j64

Floor answered 24/12, 2019 at 11:10 Comment(1)
You can use -j150% to use as many threads as 150% of the number of cores, and I'm told that you can use -j+4 to use as many threads as the number of cores plus 4Newkirk
S
5

You could also using the + variant of -exec. It starts parallel after find has completed, but also allows for you to still use -print/-printf/-ls/etc. and possibly abort the find before executing the command:

find . -type f -name '*.zip' -ls -exec parallel unzip -d {.} ::: {} \+

Note that GNU Parallel also uses {} to specify the input arguments. In this case, however, we use {.} to strip the extension like shown in your example. You can override the GNU Parallel's replacement string {} with -I (for example, using -I@@ allows for you to use @@ instead of {}).

I recommend using GNU Parallel's --dry-run flag or prepending unzip with an echo to test the command first and see what would be executed.

Stearic answered 27/6, 2021 at 2:57 Comment(1)
Unfortunately, it does not work using GNU Parallel 20221122, for example. There, the --dry-run option shows that it tries to execute commands like unzip -d ./NAME_WITHOUT_EXTENSION (without the name of the zip file). Also, this construction lacks the safeguards of the find . -type f -name '*.zip' -print0 | parallel -0 unzip -d {/.} {} answer against spaces and carriage returns in filenames.Herculean

© 2022 - 2024 — McMap. All rights reserved.