parallel check md5 file
Asked Answered
T

2

5

I have a md5sum file containing lots of lines. I want to use GNU parallel to accelerate the md5sum checking process. In the md5sum, when no file input, it will take the md5 string from stdin. I tried this:

cat checksums.md5 | parallel md5sum -c {}

But getting this error:

md5sum 445350b414a8031d9dd6b1e68a6f2367 testing.gz: No such file or directory

How can I parallel the md5sum checking?

Tetrafluoroethylene answered 4/12, 2015 at 6:37 Comment(0)
E
12

Assuming checksums.md5 has the format:

d41d8cd98f00b204e9800998ecf8427e  My file name

Run:

cat checksums.md5 | parallel --pipe -N1 md5sum -c

If your files are small: -N100

If that does not speed up your processing make sure your disks are fast enough: md5sum can process 500 MB/s. iostat -dkx 1 can tell you if your disks are a bottleneck.

Encomiastic answered 5/12, 2015 at 1:13 Comment(2)
Thanks guys. I tried to both --block and -N and use top to check num of cpu usage. --block only uses 1 cpu regardless of what value I put (1M, 10M, 100M). -N1 used up a lot of cpus, -N10 uses only a few cpus and -N0 & -N100 use only 1 cpu. Not sure why, but will use -N1 in the future.Tetrafluoroethylene
The reason is that you only only have few files (i.e. the size of checksums.md5 is far less than 1 MB)Encomiastic
B
1

You need option --pipe. In this mode parallel splits stdin into blocks and supplies each block to the command via stdin, see man parallel for details:

cat checksums.md5 | parallel --pipe md5sum -c -

By default size of the block is 1 MB, can be changed with --block option.

Bufflehead answered 4/12, 2015 at 6:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.