I have to compare checksum of all files in /primary
and /secondary
folders in machineA
with files in this folder /bat/snap/
which is in remote server machineB
. The remote server will have lots of files along with the files we have in machineA
.
- If there is any mismatch in checksum then I want to report all those files that have issues in
machineA
with full path and exit with non zero status code. - If everything is matching then exit zero.
I wrote one command (not sure whether there is any better way to write it) that I am running on machineA
but its very slow. Is there any way to make it faster?
(cd /primary && find . -type f -exec md5sum {} +; cd /secondary && find . -type f -exec md5sum {} +) | ssh machineB '(cd /bat/snap/ && md5sum -c)'
Also it prints out file name like this ./abc_monthly_1536_proc_7.data: OK
. Is there any way by which it can print out full path name of that file on machineA
?
ssh to remote host for every file definitely isn't very efficient. parallel
could speed it up by doing it concurrently for more files, but the more efficient way is likely to tweak the command a bit so it does ssh to machineB and gets all the md5sum in one shot. Is this possible to do?
find $(pwd) -type f
... – Screechingcd /primary && find . ...
just usefind /full/path/primary
., find does not care what is your current directory as long a you pass absolute paths. – Waites/primary
and/secondary
are on different physical disks, you may be able to get a slight speedup by changing the;
beforecd /secondary
to a&
. Otherwise you're already running at very close to max speed AFAICT. – Footbridgeshopt -s globstar
time md5sum /primary/**/*
plustime md5sum /secondary/**/*
? – Footbridgersync -ncav
(which uses MD4 instead of MD5 but more to the point implements most if not all of what's needed here). If that doesn't work, my second try would be to compare file size before calculating MD5 (or perhaps cksum or CRC?); a mismatch fails w/out needing to be checksummed. – Service