Compare 2 Folders and Find Files with Differing Byte Counts
Asked Answered
M

4

12

Using Gnome in Linux Mint 12, I copied a Folder of about 9.7 GB (containing a complex tree of subfolders) from one NTFS Flash Drive to another NTFS Flash Drive. According to Gnome the file counts match, but according to du (and other programs) the byte counts don't match. (I've had the same problem copying folders in other Linux distros and Windows XP.)

I only want to know which files don't have matching byte counts. (I don't want to compare the contents of each file, because that would take way too long.) What's the best, easiest and fastest way to find the byte-count-mismatched files?

Murry answered 18/6, 2012 at 16:37 Comment(1)
One-liner solutions found for related Unix StackExchange question: unix.stackexchange.com/q/62140Kmeson
E
11

I would adapt the answer by @user1464130 as it has trouble handling spaces in file names.

cd dir1
find . -type f -printf "%p %s\n" | sort > ~/dir1.txt
cd dir2
find . -type f -printf "%p %s\n" | sort > ~/dir2.txt
diff ~/dir1.txt ~/dir2.txt

If you want to launch a command on each file and use the result in the report, you can use the while Bash construct. This example uses md5sum to compute a checksum for each file.

find . -maxdepth 1 -type f -printf "%p %s\n" | while read path size; do echo "$path - $(md5sum $path | tr -s " " | cut -f 1 -d " ") - $size" ; done

Each $() is executed separately and allows us to compute the checksum for each file. The use of tr squeezes every consecutive spaces into a single space and cut extracts the word in the n-th position, here in the first position. If we don't do that, we get the name of the file two times because md5sum give it back on stdout.

Here is an example without using the comparison (no diff). Note that I've used a dash - to emphasize the three datas we output about each file but it could be a problem if you want to feed it to another program.

$ find . -maxdepth 1 -name "*.c" -type f -printf "%p %s\n" |  while read path size; do echo "$path - $(md5sum $path | tr -s " " | cut -f 1 -d " ") - $size" ; done
./thread.c - 5f2b7b12c7cd12fcb9e9796078e5d15b - 584
./utils.c - d61bc1dbc72768e622a04f03e3b8f7a2 - 3413

EDIT : And to handle spaces in filenames and still get the checksum and the size, you can use the following code.

$ find . -maxdepth 1 -name "*.c" -type f -print0 | xargs -0 -n 1 md5sum | while read checksum path; do echo $path $(stat --printf="%s" "$path") $checksum ; done
./ini tia li za tion.c 84 31626123e9056bac2e96b472bd62f309
Eraste answered 7/3, 2015 at 14:45 Comment(5)
How difficult would it be to adjust this script to print a checksum for each file in the listing?Anchorage
I'v edited my answer to provide a solution. I've just added the checksum without doing the diff. Do you want to diff on the checksum ? If it is the case then you don't need the byte count and it differs a little bit from the OP question. Moreover the checksum is better if we want to be sure that both files are the same or not. We could even add a file modification timeEraste
very helpful. Thanks so much.Bahadur
Fails if the filenames contains spaces.Circumstantiality
Added an example to handle spaces in filenamesEraste
B
8

Did you check if both partitions have the same attributes? (block size, size, reserved space for deletions or bad blocks, etc.)

For your specific case, I would recommend rsync with option -n (or --dry-run). It will tell you which files are different. That is:

$ rsync -I -n /source/ /target/

The option -I is to ignore times. You can use the same command to make both directories equivalent (timestamp, permissions, etc.).

Check the manual of rsync or try the option --help to get more options and examples on how to use it. It is very powerful.

Barracks answered 24/6, 2012 at 22:16 Comment(0)
W
3

Assuming you need to compare dir1 and dir 2, here are the console commands:

cd dir1
find . -type f|sort|xargs ls -l| awk '{print $5,$8}' > ~/dir1.txt
cd dir2
find . -type f|sort|xargs ls -l| awk '{print $5,$8}' > ~/dir2.txt
diff ~/dir1.txt ~/dir2.txt

You may need to edit awk parameters to make it print file length and path properly.

Wattage answered 18/6, 2012 at 17:10 Comment(0)
D
0

1-liner, following the advice in Filesize difference of same name folders:

diff -u <(cd dir1 && du -a | sort -k2) <(cd dir2 && du -a | sort -k2)
Dipody answered 10/9 at 8:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.