Bash: Find file with max lines count
Asked Answered
E

4

13

This is my try to do it

  • Find all *.java files
    find . -name '*.java'
  • Count lines
    wc -l
  • Delete last line
    sed '$d'
  • Use AWK to find max lines-count in wc output
    awk 'max=="" || data=="" || $1 > max {max=$1 ; data=$2} END{ print max " " data}'

then merge it to single line

find . -name '*.java' | xargs wc -l | sed '$d' | awk 'max=="" || data=="" || $1 > max {max=$1 ; data=$2} END{ print max " " data}'

Can I somehow implement counting just non-blank lines?

Eastlake answered 13/12, 2011 at 11:21 Comment(1)
Your solution as is will probably fall over when encountering unusual file names. Use -print0 in find in conjunction with -0 option in xargs, something like this - find . -name '*.java' -print0 | xargs -0 wc -l | sort -n | tail -2 | head -1Rune
C
26
find . -type f -name "*.java" -exec grep -H -c '[^[:space:]]' {} \; | \
    sort -nr -t":" -k2 | awk -F: '{print $1; exit;}'

Replace the awk command with head -n1 if you also want to see the number of non-blank lines.


Breakdown of the command:

find . -type f -name "*.java" -exec grep -H -c '[^[:space:]]' {} \; 
'---------------------------'       '-----------------------'
             |                                   |
   for each *.java file             Use grep to count non-empty lines
                                   -H includes filenames in the output
                                 (output = ./full/path/to/file.java:count)

| sort -nr -t":" -k2  | awk -F: '{print $1; exit;}'
  '----------------'    '-------------------------'
          |                            |
  Sort the output in         Print filename of the first entry (largest count)
reverse order using the         then exit immediately
  second column (count)
Cusec answered 13/12, 2011 at 12:28 Comment(5)
Great, I like this more, cause it revealed find -exec option, which is more useful than loopingEastlake
It'd fail for file names that contain colons or newlines.Distinguishing
@EdMorton your filenames contain newlines?Eastlake
MarekSebera I personally don't deliberately create file names containing newlines but I do come across them on various systems I work on and assuming your software will never have to work when the file names contains newlines is one of the ways your code can fail and be exploited. Using -print0 and xargs -0 as @Rune suggested and I used in my answer is one way to help you avoid such problems.Distinguishing
So the 3 answers to your filenames contain newlines? are - a) no, not the filenames I manually create, b) maybe, e.g. filenames I create as a result of being required by a customer to write a tool to create file names from input that itself can contain newlines, e.g. fields in a CSV exported from Excel and c) yes, the filenames I don't create but that can exist on machines my software runs on.Distinguishing
M
18
find . -name "*.java" -type f | xargs wc -l | sort -rn | grep -v ' total$' | head -1
Modular answered 13/12, 2011 at 12:21 Comment(3)
Not bad, but needs edit to show only file with most lines of code, now it shows all files with their countsEastlake
yeah ..you are right.just forgot to add one more pipe.added nowModular
Super helpful to get a "top 10" of files with most lines in it, by changing head -1 to head -10Stiffler
F
0

Something like this might work:

find . -name '*.java'|while read filename; do
    nlines=`grep -v -E '^[[:space:]]*$' "$filename"|wc -l`
    echo $nlines $filename
done|sort -nr|head -1

(edited as per Ed Morton's comment. I must have had too much coffee :-) )

Francophile answered 13/12, 2011 at 11:30 Comment(0)
D
0

To get the size of all of your files using awk is just:

$ find . -name '*.java' -print0 | xargs -0 awk '
BEGIN { for (i=1;i<ARGC;i++) size[ARGV[i]]=0 }
{ size[FILENAME]++ }
END { for (file in size) print size[file], file }
'

To get the count of the non-empty lines, simply make the line where you increment the size[] conditional:

$ find . -name '*.java' -print0 | xargs -0 awk '
BEGIN { for (i=1;i<ARGC;i++) size[ARGV[i]]=0 }
NF { size[FILENAME]++ }
END { for (file in size) print size[file], file }
'

(If you want to consider lines that contain only blanks as "empty" then replace NF with /^./.)

To get only the file with the most non-empty lines just tweak again:

$ find . -name '*.java' -print0 | xargs -0 awk '
BEGIN { for (i=1;i<ARGC;i++) size[ARGV[i]]=0 }
NF { size[FILENAME]++ }
END {
   for (file in size) {
      if (size[file] >= maxSize) {
         maxSize = size[file]
         maxFile = file
      }
   }
   print maxSize, maxFile
}
'
Distinguishing answered 26/11, 2012 at 22:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.