Suppressing summary information in `wc -l` output
Asked Answered
M

12

5

I use the command wc -l count number of lines in my text files (also i want to sort everything through a pipe), like this:

wc -l $directory-path/*.txt | sort -rn

The output includes "total" line, which is the sum of lines of all files:

10 total
5 ./directory/1.txt
3 ./directory/2.txt
2 ./directory/3.txt

Is there any way to suppress this summary line? Or even better, to change the way the summary line is worded? For example, instead of "10", the word "lines" and instead of "total" the word "file".

Mars answered 29/12, 2016 at 18:32 Comment(4)
The man page for wc doesn't mention any such functionality. You can whip up a script (or probably use pipes and awk) to change the appearance of the output.Karilla
Pipe it to tail +2 to skip the first line.Chilson
@Barmar: That's unreliable. It only prints the total line if there's more than one file. And at least on my system, the total line is printed last -- as POSIX specifically requires. ipo: Do you really get the output you show, with the 10 total line at the top?Archerfish
Based on your comments, I think you're seeing 10 total at the top because you're sorting the output. You need to mention that in the question. Show us the exact command you're running, and its exact output. And $directory-path is not a valid variable name.Archerfish
J
6

Yet a sed solution!

1. short and quick

As total are comming on last line, $d is the command for deleting last line.

wc -l $directory-path/*.txt | sed '$d'

2. with header line addition:

wc -l $directory-path/*.txt | sed '$d;1ilines total'

Unfortunely, there is no alignment.

3. With alignment: formatting left column at 11 char width.

wc -l $directory-path/*.txt |
    sed -e '
        s/^ *\([0-9]\+\)/          \1/;
        s/^ *\([0-9 ]\{11\}\) /\1 /;
        /^ *[0-9]\+ total$/d;
        1i\      lines filename'

Will do the job

      lines file
          5 ./directory/1.txt
          3 ./directory/2.txt
          2 ./directory/3.txt

4. But if really your wc version could put total on 1st line:

This one is for fun, because I don't belive there is a wc version that put total on 1st line, but...

This version drop total line everywhere and add header line at top of output.

wc -l $directory-path/*.txt |
    sed -e '
        s/^ *\([0-9]\+\)/          \1/;
        s/^ *\([0-9 ]\{11\}\) /\1 /;
        1{
            /^ *[0-9]\+ total$/ba;
            bb;
           :a;
            s/^.*$/      lines file/
        };
        bc;
       :b;
        1i\      lines file' -e '
       :c;
        /^ *[0-9]\+ total$/d
    '

This is more complicated because we won't drop 1st line, even if it's total line.

Jaenicke answered 30/12, 2016 at 0:0 Comment(5)
I'm reasonably sure he's seeing the total on the first line because he's sorting the output. He's mentioned this in comments, but needs to say so in the question. And there's no indication that he wants or needs the "lines filename" header that your solutions produce.Archerfish
Seems too complicated for such a small operation.Bamford
@KeithThompson Asker said: For example, instead of "10", the word "lines" and instead of "total" the word "file" !Jaenicke
@Bamford It seem complicated because you won't try to understand. I'ts not as simple, but it's very quick and work as standalone solutionJaenicke
@Bamford Ok, there is a simplier sed version: 2 characters!Jaenicke
A
1

This is actually fairly tricky.

I'm basing this on the GNU coreutils version of the wc command. Note that the total line is normally printed last, not first (see my comment on the question).

wc -l prints one line for each input file, consisting of the number of lines in the file followed by the name of the file. (The file name is omitted if there are no file name arguments; in that case it counts lines in stdin.)

If and only if there's more than one file name argument, it prints a final line containing the total number of lines and the word total. The documentation indicates no way to inhibit that summary line.

Other than the fact that it's preceded by other output, that line is indistinguishable from output for a file whose name happens to be total.

So to reliably filter out the total line, you'd have to read all the output of wc -l, and remove the final line only if the total length of the output is greater than 1. (Even that can fail if you have files with newlines in their names, but you can probably ignore that possibility.)

A more reliable method is to invoke wc -l on each file individually, avoiding the total line:

for file in $directory-path/*.txt ; do wc -l "$file" ; done

And if you want to sort the output (something you mentioned in a comment but not in your question):

for file in $directory-path/*.txt ; do wc -l "$file" ; done | sort -rn

If you happen to know that there are no files named total, a quick-and-dirty method is:

wc -l $directory-path/*.txt | grep -v ' total$'

If you want to run wc -l on all the files and then filter out the total line, here's a bash script that should do the job. Adjust the *.txt as needed.

#!/bin/bash

wc -l *.txt > .wc.out
lines=$(wc -l < .wc.out)
if [[ lines -eq 1 ]] ; then
    cat .wc.out
else
    (( lines-- ))
    head -n $lines .wc.out
fi
rm .wc.out

Another option is this Perl one-liner:

wc -l *.txt | perl -e '@lines = <>; pop @lines if scalar @lines > 1; print @lines'

@lines = <> slurps all the input into an array of strings. pop @lines discards the last line if there are more than one, i.e., if the last line is the total line.

Archerfish answered 29/12, 2016 at 20:21 Comment(5)
Thanks for the detailed comment. But i have to use wc -l at the end, because i also have to sort them. Thats not possible, when I do wc -l on each file. The quick-and-dirty method is also not so good. Maybe i have a file named 'total'.Mars
@ipo: Sure you can sort the output: for file in $directory-path/*.txt ; do wc -l "$file" ; done | sort -rn. (I'm assuming you're using a Bourne-derived shell like bash.)Archerfish
@gniourf_gniourf: Done. (I thought I had; not sure how I missed that.)Archerfish
You miss: /bin/ls -1 *.txt | xargs -n1 wc -l and/or find . -maxdepth 1 -name '*.txt' -exec wc -l {} \; ;-)Jaenicke
@F.Hauri: I wouldn't say I "missed" those. I didn't intend to show all possible solutions.Archerfish
C
1

The program wc, always displays the total when they are two or more than two files ( fragment of wc.c):

if (argc > 2)
     report ("total", total_ccount, total_wcount, total_lcount);
   return 0;

also the easiest is to use wc with only one file and find present - one after the other - the file to wc:

find $dir -name '*.txt' -exec wc -l {} \;

Or as specified by liborm.

dir="."
find $dir -name '*.txt' -exec wc -l {} \; | sort -rn | sed 's/\.txt$//'
Columbary answered 29/12, 2016 at 20:39 Comment(7)
Thats nearly the solution! But i need to pipe this one as well to | sort -rn | sed 's/\.txt$//' Where should i place this pipe? I tried find $dicitonary-path/*.txt-exec wc -l {} \ | sort -rn | sed 's/\.txt$//'; ...but this is wrong.Mars
I think you're missing a -name argument in your find command.Archerfish
@ipo like that, but without the typos.. find $PATH -name '*.txt' -exec wc -l {} \; | sort -rn | sed 's/\.txt$//'Danged
@Danged :Thank's, i have put your cmd inside my response. If it's a problem, i can remove it.Columbary
@Keith Thompson : You're right, thank's for your help.Columbary
@ipo: Why would you want to strip the .txt portion of the file names (sed 's/\.txt$//')? You really need to update your question and state the problem more precisely. Read this: minimal reproducible exampleArcherfish
It's 2 or more files, not more than 2 files. argc is the number of arguments including argv[0], which is the program name ("wc").Archerfish
L
1

This is a job tailor-made for head:

wc -l | head --lines=-1

This way, you can still run in one process.

Laundrywoman answered 2/5, 2022 at 12:49 Comment(1)
There are a lot of complicated solutions from people having fun with the problem, but head -n -1 before sorting seems best. Surprising that wc does not have a quiet or script use mode.Squarerigger
O
0

Can you use another wc ?

The POSIX wc(man -s1p wc) shows
If more than one input file operand is specified, an additional line shall be written, of the same format as the other lines, except that the word total (in the POSIX locale) shall be written instead of a pathname and the total of each column shall be written as appropriate. Such an additional line, if any, is written at the end of the output.

You said the Total line was the first line, the manual states its the last and other wc's don't show it at all. Removing the first or last line is dangerous, so I would grep -v the line with the total (in the POSIX locale...), or just grep the slash that's part of all other lines:

wc -l $directory-path/*.txt | grep "/"
Och answered 29/12, 2016 at 20:16 Comment(0)
V
0

Not the most optimized way since you can use combinations of cat, echo, coreutils, awk, sed, tac, etc., but this will get you want you want:

wc -l ./*.txt | awk 'BEGIN{print "Line\tFile"}1' | sed '$d'

wc -l ./*.txt will extract the line count. awk 'BEGIN{print "Line\tFile"}1' will add the header titles. The 1 corresponds to the first line of the stdin. sed '$d' will print all lines except the last one.

Example Result

Line    File
      6 ./test1.txt
      1 ./test2.txt
Venial answered 29/12, 2016 at 21:2 Comment(2)
All i get is something like this 'Line File' above '10 total'. So like your example, but with the total-information again.Mars
@ipo: what kind of system are you running? I'm using zsh on a OSX system. My total line count appears at the end. Try using this: wc -l ./*.txt | awk 'BEGIN{print "Line\tFile"}1' | sed '2d'. The only difference is that the sed should delete the 2nd line, not the last line now.Venial
B
0

The simplicity of using just grep -c

I rarely use wc -l in my scripts because of these issues. I use grep -c instead. Though it is not as efficient as wc -l, we don't need to worry about other issues like the summary line, white space, or forking extra processes.

For example:

/var/log# grep -c '^' *
alternatives.log:0
alternatives.log.1:3
apache2:0
apport.log:160
apport.log.1:196
apt:0
auth.log:8741
auth.log.1:21534
boot.log:94
btmp:0
btmp.1:0
<snip>

Very straight forward for a single file:

line_count=$(grep -c '^' my_file.txt)

Performance comparison: grep -c vs wc -l

/tmp# ls -l *txt
-rw-r--r-- 1 root root 721009809 Dec 29 22:09 x.txt
-rw-r----- 1 root root 809338646 Dec 29 22:10 xyz.txt

/tmp# time grep -c '^' *txt

x.txt:7558434
xyz.txt:8484396

real    0m12.742s
user    0m1.960s
sys 0m3.480s

/tmp/# time wc -l *txt
   7558434 x.txt
   8484396 xyz.txt
  16042830 total

real    0m9.790s
user    0m0.776s
sys 0m2.576s
Bamford answered 29/12, 2016 at 22:0 Comment(3)
But grep -c . counts non-empty lines. You'll probably want grep -c '' as an approximation of wc -l (the two differ by one if the last “line” doesn't end with a newline).Marci
Wonderful observation, @gniourf_gniourf. I changed the command to grep -c '^'.Bamford
grep -c '^' also differs by one from wc -l if the last line doesn't end with a newline. In fact grep (at least the GNU version) always silently appends a newline if the last line doesn't have one.Archerfish
S
0

You can solve it (and many other problems that appear to need a for loop) quite succinctly using GNU Parallel like this:

parallel wc -l ::: tmp/*txt

Sample Output

   3 tmp/lines.txt
   5 tmp/unfiltered.txt
  42 tmp/file.txt
   6 tmp/used.txt
Squabble answered 29/12, 2016 at 22:16 Comment(2)
parallel -j1 if your files are really big, otherwise you'll clog your disk with parallel requests for data..Danged
Possibly, though many folk run very fast SSDs nowadays and there was no indication that OP is using excessively large files and it could actually be an advantage to use GNU Parallel there anyway.Squabble
H
0

Similar to Mark Setchell's answer you can also use xargs with an explicit separator:

ls | xargs -I% wc -l %

Then xargs explicitly doesn't send all the inputs to wc, but one operand line at a time.

Halford answered 3/5, 2021 at 14:30 Comment(0)
I
0

Shortest answer:

ls | xargs -l wc
Impound answered 31/3, 2022 at 12:14 Comment(0)
N
0

What about using sed with the pattern removal option as below which would only remove the total line if it is present (but also any files with total in them).

wc -l $directory-path/*.txt | sort -rn | sed '/total/d'

Nevsa answered 8/6, 2022 at 7:20 Comment(0)
M
0

While most of the answers center around removing the unneeded line, or using a version of wc that allows suppressing it, there's something to be said in favor of never producing it in the first place.

So you want to count lines in $directory-path/*.txt files, however feeding several files to wc will produce the total — which you don't want.

I would change your pipeline to find the files and feeding them to wc one by one, in this manner:

find $directory-path -name "*.txt" | xargs -L 1 wc -l | sort -rn

In this case, find is tasked with locating files, while xargs -L 1 is tasked with feeding them to wc one by one.

Moises answered 4/8, 2023 at 10:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.