Measure disk space of certain file types in aggregate

Asked 31/8, 2009 at 19:5 Answered 12/9, 2019 at 19:5

I have some files across several folders:

/home/d/folder1/a.txt
/home/d/folder1/b.txt
/home/d/folder1/c.mov
/home/d/folder2/a.txt
/home/d/folder2/d.mov
/home/d/folder2/folder3/f.txt

How can I measure the grand total amount of disk space taken up by all the .txt files in /home/d/?

I know du will give me the total space of a given folder, and ls -l will give me the total space of individual files, but what if I want to add up all the txt files and just look at the space taken by all .txt files in one giant total for all .txt in /home/d/ including both folder1 and folder2 and their subfolders like folder3?

Grosso answered 31/8, 2009 at 19:5 Comment(1)

If you needed it to run on HP-UX, why did you use the linux tag? – Molder 31/8, 2009 at 20:2

This will do it:

total=0
for file in *.txt
do
    space=$(ls -l "$file" | awk '{print $5}')
    let total+=space
done
echo $total

Amaurosis answered 31/8, 2009 at 19:10 Comment(5)

Will that find the files in subfolders folder1 and folder2? – Positive 31/8, 2009 at 19:15

Used a slight variation. Removed the first -l in ls. This still doesn't do any recursion, and it'll bomb on anything with spaces, but it is the closest thing I have. Thanks – Grosso 31/8, 2009 at 19:36

no problem....I missed the subfolder requirement but thats easily handled by changing the for command to somethng like find . -name *.txt -exec ls {} ;\ – Amaurosis 31/8, 2009 at 19:38

that ls *.txt in the for loop is redundant. just use shell expansion. --> for file in *.txt – Elamite 1/9, 2009 at 11:1

there is a typo in your statement ennuikiller, not ';\' but >find . -name "*.txt" -exec ls {} \; – Footton 17/3, 2019 at 1:35

find folder1 folder2 -iname '*.txt' -print0 | du --files0-from - -c -s | tail -1

Molder answered 31/8, 2009 at 19:10 Comment(12)

du doesnt appear to have a --files-from option – Amaurosis 31/8, 2009 at 19:16

I meant a --files0-from option – Amaurosis 31/8, 2009 at 19:16

du --version du (GNU coreutils) 5.93 - works on my machine. – Molder 31/8, 2009 at 19:17

And on my Cygwin install: du --version du (GNU coreutils) 6.10 – Molder 31/8, 2009 at 19:18

on my linux box I'm running coreutils 4.5.3 so it's a bit outdated – Amaurosis 31/8, 2009 at 19:20

Of course, if compatibility with POSIX were required, then this wouldn't apply. But my (limited) experience with non-GNU userland has indicated that life's pretty harsh out there, best stick with GNU. – Molder 31/8, 2009 at 19:20

HP-UX Release 11i: November 2000... Ya, I don't have several of those options you used. Otherwise it is a nice one line solution, just not going to work for me. – Grosso 31/8, 2009 at 19:24

However, this number is rounded up to the nearest block for all files introducing a small error. With many small files, the error margin can be quite noticeable. For instance, du will print 4K for a file with 788 bytes of contents. – Institutor 20/10, 2016 at 13:51

@Institutor the question is about disk space, so rounding up to the nearest block is correct, and not a small error. But actually, it's even more than that: du, by default, will (normally) report on-disk size even for filesystems with transparent compression. Again, this is what we want. du also doesn't double-count hard-linked files. – Molder 21/10, 2016 at 13:58

@Barry: If it is about disk space, then you are right. If you are considering reading all those files to RAM however, the error margin still applies. – Institutor 5/12, 2016 at 13:41

I would be surprised if the physical IO layer implements partial reads below the block granularity. It would depend on the command sequence on the wire; at the very least, I'd expect minimum of 512 bytes granularity and 4K on modern hard drives (the drives wouldn't know how to interpret finer grained addresses). I'd expect even bigger read granularity on SSDs due to how they implement writes. – Molder 6/12, 2016 at 14:5

The example is the best answer but has a subtle flaw: if there is a subdirectory named e.g. "my.txt" then du will total all of the files in it regardless of file name. Adding -type f to the find will select only files. – Affective 25/3, 2017 at 14:52

This will report disk space usage in bytes by extension:

find . -type f -printf "%f %s\n" |
  awk '{
      PARTSCOUNT=split( $1, FILEPARTS, "." );
      EXTENSION=PARTSCOUNT == 1 ? "NULL" : FILEPARTS[PARTSCOUNT];
      FILETYPE_MAP[EXTENSION]+=$2
    }
   END {
     for( FILETYPE in FILETYPE_MAP ) {
       print FILETYPE_MAP[FILETYPE], FILETYPE;
      }
   }' | sort -n

Output:

3250 png
30334451 mov
57725092729 m4a
69460813270 3gp
79456825676 mp3
131208301755 mp4

Keelykeen answered 8/2, 2013 at 11:4 Comment(1)

This works wonderfully. To get human readable output, like 123GiB mp4, pipe the output to numfmt --field=1 --to=iec-i --format "%8f" --suffix B. – Cirrose 28/1, 2017 at 18:50

Simple:

du -ch *.txt

If you just want the total space taken to show up, then:

du -ch *.txt | tail -1

Claudicant answered 4/10, 2013 at 17:39 Comment(2)

Try du -ch /home/d/**/*.txt | tail -1 – Lewiss 17/1, 2019 at 1:23

@FranklinClark While that would work, it relies on shopt -s globstar, at least with Bash. – Megaspore 12/9, 2019 at 19:22

Here's a way to do it (in Linux, using GNU coreutils du and Bash syntax), avoiding bad practice:

total=0
while read -r line
do
    size=($line)
    (( total+=size ))
done < <( find . -iname "*.txt" -exec du -b {} + )
echo "$total"

If you want to exclude the current directory, use -mindepth 2 with find.

Another version that doesn't require Bash syntax:

find . -iname "*.txt" -exec du -b {} + | awk '{total += $1} END {print total}'

Note that these won't work properly with file names which include newlines (but those with spaces will work).

Defoliate answered 31/8, 2009 at 21:13 Comment(2)

As noted in its manual, POSIX du later removed the -b option, prints usage in 512-byte blocks (default) or 1-K blocks with -k. The -b option is absent from the BSD/macOS version and appears to be Linux-only. The situation for stat is not much better (note -c vs. -f for the "format" option). My opinion is it's best to avoid du (or stat) for this, unless you only care about one OS or you're prepared to smooth over the inconsistencies among Unixes. – Megaspore 12/9, 2019 at 18:17

@TheDudeAbides: This is a better source for the information you posted. Note that it doesn't proscribe -b from being included in implementations of du and GNU continues to include it. – Defoliate 12/9, 2019 at 20:5

macOS

use the tool du and the parameter -I to exclude all other files

Linux

-X, --exclude-from=FILE
              exclude files that match any pattern in FILE

--exclude=PATTERN
              exclude files that match PATTERN

Clupeid answered 1/9, 2009 at 9:15 Comment(2)

I can't find any mention of -I on the man page. Could you provide an example? – Ionopause 19/6, 2017 at 0:39

@FelixEve Please read my answer again. I think this should be clear that -I only exists for du running on macOS. – Clupeid 5/4, 2019 at 12:5

This will do it:

total=0
for file in *.txt
do
    space=$(ls -l "$file" | awk '{print $5}')
    let total+=space
done
echo $total

Amaurosis answered 31/8, 2009 at 19:10 Comment(5)

Will that find the files in subfolders folder1 and folder2? – Positive 31/8, 2009 at 19:15

no problem....I missed the subfolder requirement but thats easily handled by changing the for command to somethng like find . -name *.txt -exec ls {} ;\ – Amaurosis 31/8, 2009 at 19:38

that ls *.txt in the for loop is redundant. just use shell expansion. --> for file in *.txt – Elamite 1/9, 2009 at 11:1

there is a typo in your statement ennuikiller, not ';\' but >find . -name "*.txt" -exec ls {} \; – Footton 17/3, 2019 at 1:35

GNU find,

find /home/d -type f -name "*.txt" -printf "%s\n" | awk '{s+=$0}END{print "total: "s" bytes"}'

Elamite answered 1/9, 2009 at 11:4 Comment(1)

BSD (macOS) find doesn't have a -printf option, so, as noted, this is a GNU/Linux-only option. If you install findutils with MacPorts or Homebrew, you can use gfind instead. – Megaspore 12/9, 2019 at 17:27

Building on ennuikiller's, this will handle spaces in names. I needed to do this and get a little report:

find -type f -name "*.wav" | grep export | ./calc_space

#!/bin/bash
# calc_space
echo SPACE USED IN MEGABYTES
echo
total=0
while read FILE
do
    du -m "$FILE"
    space=$(du -m "$FILE"| awk '{print $1}')
    let total+=space
done
echo $total

Reni answered 22/8, 2010 at 16:50 Comment(0)

A one liner for those with GNU tools on bash:

for i in $(find . -type f | perl -ne 'print $1 if m/\.([^.\/]+)$/' | sort -u); do echo "$i"": ""$(du -hac **/*."$i" | tail -n1 | awk '{print $1;}')"; done | sort -h -k 2 -r

You must have extglob enabled:

shopt -s extglob

If you want dot files to work, you must run

shopt -s dotglob

Sample output:

d: 3.0G
swp: 1.3G
mp4: 626M
txt: 263M
pdf: 238M
ogv: 115M
i: 76M
pkl: 65M
pptx: 56M
mat: 50M
png: 29M
eps: 25M

etc

Geophysics answered 28/8, 2015 at 12:20 Comment(1)

I dig where you're going with this, but by the time you've wrapped a for loop around a call to find, perl, sort, du, tail, awk, and sort again, it's probably time to let go of the "one-liner" badge of pride and break this over multiple lines with some backslashes. No one can admire your clever solution when most of it is buried in the right margin. :) – Megaspore 12/9, 2019 at 19:36

my solution to get a total size of all text files in a given path and subdirectories (using perl oneliner)

find /path -iname '*.txt' | perl -lane '$sum += -s $_; END {print $sum}'

Thermocouple answered 27/7, 2016 at 22:52 Comment(0)

I like to use find in combination with xargs:

find . -name "*.txt" -print0 |xargs -0 du -ch

Add tail if you only want to see the grand total

find . -name "*.txt" -print0 |xargs -0 du -ch | tail -n1

Handicraft answered 22/7, 2016 at 9:42 Comment(0)

For anyone wanting to do this with macOS at the command line, you need a variation based on the -print0 argument instead of printf. Some of the above answers address that but this will do it comprehensively by extension:

    find . -type f -print0 | xargs -0 stat -f "%N %i" |
  awk '{
      PARTSCOUNT=split( $1, FILEPARTS, "." );
      EXTENSION=PARTSCOUNT == 1 ? "NULL" : FILEPARTS[PARTSCOUNT];
      FILETYPE_MAP[EXTENSION]+=$2
    }
   END {
     for( FILETYPE in FILETYPE_MAP ) {
       print FILETYPE_MAP[FILETYPE], FILETYPE;
      }
   }' | sort -n

Armitage answered 27/12, 2018 at 20:5 Comment(0)

There are several potential problems with the accepted answer:

it does not descend into subdirectories (without relying on non-standard shell features like globstar)
in general, as pointed out by Dennis Williamson below, you should avoid parsing the output of ls
- namely, if the user or group (columns 3 and 4) have spaces in them, column 5 will not be the file size
if you have a million such files, this will spawn two million subshells, and it'll be sloooow

As proposed by ghostdog74, you can use the GNU-specific -printf option to find to achieve a more robust solution, avoiding all the excessive pipes, subshells, Perl, and weird du options:

# the '%s' format string means "the file's size"
find . -name "*.txt" -printf "%s\n" \
  | awk '{sum += $1} END{print sum " bytes"}'

Yes, yes, solutions using paste or bc are also possible, but not any more straightforward.

On macOS, you would need to use Homebrew or MacPorts to install findutils, and call gfind instead. (I see the "linux" tag on this question, but it's also tagged "unix".)

Without GNU find, you can still fall back to using du:

find . -name "*.txt" -exec du -k {} + \
  | awk '{kbytes+=$1} END{print kbytes " Kbytes"}'

…but you have to be mindful of the fact that du's default output is in 512-byte blocks for historical reasons (see the "RATIONALE" section of the man page), and some versions of du (notably, macOS's) will not even have an option to print sizes in bytes.

Many other fine solutions here (see Barn's answer in particular), but most suffer the drawback of being unnecessarily complex or depending too heavily on GNU-only features—and maybe in your environment, that's OK!

Megaspore answered 12/9, 2019 at 19:5 Comment(2)

I get find: -exec: no terminating ";" or "+" on both bash and zsh – Natatory 14/2, 2020 at 4:31

@Natatory Good catch, sorry about that. A find -exec always needs a {} so it knows where to put the incoming arguments. I get lazy because this is inferred with xargs and GNU Parallel. – Megaspore 17/2, 2020 at 20:5

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags