Why does "find . -name *.txt | xargs du -hc" give multiple totals?
Asked Answered
B

7

11

I have a large set of directories for which I'm trying to calculate the sum total size of several hundred .txt files. I tried this, which mostly works:

find . -name *.txt | xargs du -hc

But instead of giving me one total at the end, I get several. My guess is that the pipe will only pass on so many lines of find's output at a time, and du just operates on each batch as it comes. Is there a way around this?

Thanks! Alex

Betaine answered 24/8, 2009 at 17:29 Comment(1)
Hm, ok. I tried: find . -name *.txt | xargs -n 100000 du -hc But that doesn't seem to work - I get more subtotals, not fewer. Trying find . -name *.txt |xargs -L 1000 du -hc doesnt' seem to work well either. Either "xargs: argument list too long", or it operates only on a very few files. Any other thoughts? Thanks! AlexBetaine
B
12

How about using the --files0-from option to du? You'd have to generate the null-terminated file output appropriately:

find . -name "*txt" -exec echo -n -e {}"\0" \; | du -hc --files0-from=-

works correctly on my system.

Batho answered 24/8, 2009 at 17:52 Comment(3)
ah, this worked for me, but I used find's -print0 instead of the exec echo stuff.Sternmost
Hmm...I didn't know about the -print0 option. That's much cleaner. Thanks!Batho
-print0 is technically not posix (for the bizarre reason that not all commands can handle files ending in $'\0'). Though most implementations seem to have extended the option.Amylose
B
9
find . -print0 -iname '*.txt' | du --files0-from=-

and if you want to have several different extensions to search for it's best to do:

find . -type f -print0 | grep -azZEi '\.(te?xt|rtf|docx?|wps)$' | du --files0-from=-
Bursa answered 21/10, 2010 at 19:57 Comment(3)
Far easier to remember than -exec echo {}"=0";. no wait. That's not right. Uhhh -exec echo -n {}"\0" \;. No? -exec echo $#&@*#(@!@#$@#!!! (Much better)Caulk
The first way you listed will never get to the -iname *.txt test and the glob *.txt will expand before the find is executed if you have .txt files in your working directory.Amylose
You are correct. Thanks for pointing out the typo. I've corrected it.Bursa
L
6

The xargs program breaks things up into batches, to account for the limits due to the maximum length of a unix command line. It's still more efficient than running your subcommand one at a time but, for a long list of inputs, it will run the command enough times that each "run" is short enough that it won't cause issues.

Because of this, you're likely seeing one output line per "batch" that xargs needs to run.

Because you may find it useful/interesting, the man page can be found online here: http://unixhelp.ed.ac.uk/CGI/man-cgi?xargs


One other thing to note (and this may be a typo in your post or my misunderstanding) is that you have the "*.txt" unescaped/quoted. Ie, you have

find . -name *.txt | xargs du -hc

where you probably want

find . -name \*.txt | xargs du -hc

The difference being that the command line may be expanding the * into the list of filenames that match... rather than passing the * into find, which will use it as a pattern.

Liftoff answered 24/8, 2009 at 17:31 Comment(0)
C
3

Another simple solution:

find . -name *.txt -print0 | xargs -0 du -hc
Custodian answered 23/8, 2012 at 16:0 Comment(1)
To improve the quality of your post please include how/why your post will solve the problem.Aglow
P
1

One alternate solution is to use bash for loop:

for i in `find . -name '*.txt'`; do du -hc $i | grep -v 'total'; done

This is good for when you need more control of what happens in the loop.

Penton answered 30/3, 2015 at 10:45 Comment(0)
M
0

xargs busts its input into reasonable-sized chunks - what you're seeing are totals for each of those chunks. Check the man page for xargs on ways to configure its handling of input.

Mitran answered 24/8, 2009 at 17:31 Comment(0)
H
0

One alternate solution is to use awk:

find . -name "*.txt" -exec ls -lt {} \; | awk -F " " 'BEGIN { sum=0 } { sum+=$5 } END { print sum }'
Hasheem answered 24/8, 2009 at 17:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.