Count number of lines in a git repository
Asked Answered
P

17

1039

How would I count the total number of lines present in all the files in a git repository?

git ls-files gives me a list of files tracked by git.

I'm looking for a command to cat all those files. Something like

git ls-files | [cat all these files] | wc -l
Phooey answered 27/1, 2011 at 22:7 Comment(0)
S
1594

xargs will let you cat all the files together before passing them to wc, like you asked:

git ls-files | xargs cat | wc -l

But skipping the intermediate cat gives you more information and is probably better:

git ls-files | xargs wc -l
Samaveda answered 27/1, 2011 at 22:11 Comment(26)
This double-counts when you have symbolic links in your repository. Maybe that's not a concern, though.Cestar
I guess trivial; How about include only source code files (eg *.cpp). We have some bin files committed :)Acetyl
Stick grep cpp | in there before the xargs, then.Samaveda
I'd like to mention that the latter (git ls-files |xargs wc -l) works in the github install of git within windows poweshell.Waneta
Use git ls-files -z | xargs -0 wc -l if you have files with spaces in the name.Berkman
This will also include images. One JPEG image in my repository apparently has 15176 lines of text.Babe
For future use you can place it in your ~/.gitconfig as an alias: count = ! git ls-files | xargs wc -l. You can then call it via git count.Cicisbeo
For what it's worth, the -l is a lowercase L, not the number one.Starrstarred
For including/excluding certain files use: git ls-files | grep -P ".*(hpp|cpp)" | xargs wc -l where the grep part is any perl regex you want!Dhoti
If you were interested in just .java files you can use git ls-files | grep "\.java$" | xargs wc -lSecurity
Counts "lines" in bin files (png/gif/etc)... :(Ornithischian
'xargs' is not recognized as an internal or external command, operable program or batch file.Unquote
@Imray That error is from a Windows command prompt, this question was tagged as bash, which is a *nix environment. Try using Cygwin, or check out cloc: sourceforge.net/projects/clocForthwith
Tried this command on Mac and got "xargs: wc: Argument list too long" error. Is it because the git repo is too big?Clapp
@shi, that could be, yes. Check the xargs man page to limit the number of arguments passed.Samaveda
The command is ls-files | grep -e ".*py" | xargs wc -l on Macs if you want to find the lines of code of python files. Don't use -P, for patterns it is -e.Rebec
@CarlNorum in this calculation does it shows total count of lines of all the branches if so how do we get only the no of lines from a specific branch, say master.Zamia
git ls-files | grep -vE "(png|jpg|ico)" | xargs wc -l -- there's an example of excluding various file types you don't want; we are counting lines after all. This was tested on mac and ubuntu.Murder
git ls-files | sed 's/ /\\ /g' | grep -E "\.*(swift$|mm$)" | xargs wc -l Using sed to escape files or paths that have spaces in them.Ambi
doesn't work when there are single quotes in a file nameGurdwara
I'm pretty sure that this is wrong - anybody who knows more about this can correct me, but surely this lists the file names in the the repository but actually counts the lines in the checked out version of those files. So if the files have changed size the total will be wrong.Enswathe
I used git ls-files | grep -v "json" | xargs wc -l to ignore json filesHeintz
This should give you the total for java files: git ls-files | grep "\.java$" | xargs cat | wc -lAlcine
Surprised noone mentioned that skipping the cat makes it prone to exceeding the maximal number of command line parameters. For the cat case, it can simply execute cat again for the remainder. For wc -l, it will give erroneous output.Palmira
Doesn't work on Windows. I'm getting a 'xargs' is not recognized as an internal or external command, operable program or batch file. on Windows.Earmark
git ls-files '**.h' '**.cpp' | .... Taken from https://mcmap.net/q/54275/-can-i-use-git-to-search-for-matching-filenames-in-a-repository.Armyworm
G
456

If you want this count because you want to get an idea of the project’s scope, you may prefer the output of CLOC (“Count Lines of Code”), which gives you a breakdown of significant and insignificant lines of code by language.

cloc $(git ls-files)

(This line is equivalent to git ls-files | xargs cloc. It uses sh’s $() command substitution feature.)

Sample output:

      20 text files.
      20 unique files.                              
       6 files ignored.

http://cloc.sourceforge.net v 1.62  T=0.22 s (62.5 files/s, 2771.2 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Javascript                       2             13            111            309
JSON                             3              0              0             58
HTML                             2              7             12             50
Handlebars                       2              0              0             37
CoffeeScript                     4              1              4             12
SASS                             1              1              1              5
-------------------------------------------------------------------------------
SUM:                            14             22            128            471
-------------------------------------------------------------------------------

You will have to install CLOC first. You can probably install cloc with your package manager – for example, brew install cloc with Homebrew.

cloc $(git ls-files) is often an improvement over cloc .. For example, the above sample output with git ls-files reports 471 lines of code. For the same project, cloc . reports a whopping 456,279 lines (and takes six minutes to run), because it searches the dependencies in the Git-ignored node_modules folder.

Guglielma answered 12/3, 2015 at 16:32 Comment(12)
CLOC ignores some languages, such as TypeScript.Parhelion
@MarceloCamargo at this moment TypeScript is supportedNephrolith
For the beginner, better to execute "cloc DIRECTORY_WHERE_YOUR_GIT_IN" to calculate lines.Clapp
The full description is here : github.com/AlDanial/cloc and the binaries are here : github.com/AlDanial/cloc/releases/tag/v1.70Adrenaline
@RoryO'Kane in here how do we know that what are the files that have been ignored in the process, will there be some code files falls to thatZamia
@KasunSiyambalapitiya You can find the answers to such questions in CLOC’s documentation. As CLOC’s README says, passing --ignored=FILE will “save names of ignored files and the reason they were ignored to FILE”.Corporation
Just a side note, this doesn't count all the lines, it excludes empty lines and lines consisting of only comments.Destructor
You can just use cloc --vcs git these days, which avoids some edge cases with badly named files (or too many of them).Presber
@Loovjo It's written right there if you read carefully, blank, comment and code.Handicraftsman
does this leaks the code. i meant the github credentials and allSlowmoving
@MadhuNair Of course not. cloc counts lines of files in a local directory, without ever accessing the network. It doesn’t even know whether the code came from GitHub or not.Corporation
Thanks, this is a helpful tool! Beware, though, that cloc does not exclude auto-generated files like JavaScript's package-lock.json. These will have to be subtracted if you want an estimate of how much work went into a piece of software.Pumphrey
C
422
git diff --stat 4b825dc642cb6eb9a060e54bf8d69288fbee4904

This shows the differences from the empty tree to your current working tree. Which happens to count all lines in your current working tree.

To get the numbers in your current working tree, do this:

git diff --shortstat `git hash-object -t tree /dev/null`

It will give you a string like 1770 files changed, 166776 insertions(+).

Cestar answered 27/1, 2011 at 22:51 Comment(14)
BTW, you can get that hash by running git hash-object -t tree /dev/null.Cestar
And even more succinct: git diff --stat `git hash-object -t tree /dev/null`Catheterize
This is the better soloution since this does not count binary files like archives or images which are counted in the version above!Lohner
+1 I like this solution better as binaries don't get counted. Also we are really just interested in the last line of the git diff output: git diff --stat `git hash-object -t tree /dev/null` | tail -1Prothrombin
Is there any way of not counting lines just containing whitespace?Tia
@CameronMartin git diff -wCestar
instead use git diff --shortstat `git hash-object -t tree /dev/null` to get the last line, tail isnt needed.Thurstan
@ChandlerLee It is the object ID of the empty tree, git hash-object -t tree /dev/null. Even if the empty tree never appears in a commit in your repository's history, Git is hard-coded to recognize it; look for EMPTY_TREE_SHA1 in the source code.Cestar
@Cestar : What does git diff -w do? I mean what is -w for ?Kamala
@Cestar I only found EMPTY_TREE_SHA1_HEXCamber
just to remember the hash ;-) use SHA1("tree 0\0") = 4b825dc642cb6eb9a060e54bf8d69288fbee4904 (\0 is NUL character)Mcdowell
@Kamala -w means Ignore whitespace when comparing lines. This ignores differences even if one line has whitespace where the other line has none. see the doc [git-scm.com/docs/git-diff]Zamia
@Catheterize git diff --stat git hash-object -t tree /dev/null` I can understand that ` ` is used to run git commands inside git commands, but can you guide me to a resource to learn about that kind of other commands, as I can't find any by searchingZamia
@Cestar in the above code does it count all the lines of code in all the branches that exists in the repo. If so what is the option to get only the lines of code in master branchZamia
A
77

The best solution, to me anyway, is buried in the comments of @ephemient's answer. I am just pulling it up here so that it doesn't go unnoticed. The credit for this should go to @FRoZeN (and @ephemient).

git diff --shortstat `git hash-object -t tree /dev/null`

returns the total of files and lines in the working directory of a repo, without any additional noise. As a bonus, only the source code is counted - binary files are excluded from the tally.

The command above works on Linux and OS X. The cross-platform version of it is

git diff --shortstat 4b825dc642cb6eb9a060e54bf8d69288fbee4904

That works on Windows, too.

For the record, the options for excluding blank lines,

  • -w/--ignore-all-space,
  • -b/--ignore-space-change,
  • --ignore-blank-lines,
  • --ignore-space-at-eol

don't have any effect when used with --shortstat. Blank lines are counted.

Actualize answered 4/3, 2015 at 15:39 Comment(2)
git mktree </dev/null or true|git mktree or git mktree <&- or :|git mktree for the keystroke-counters among us :-) - a spare empty tree floating around the repo isn't going to hurt anything.Lakenyalaker
For people wondering what is that hash out of the blue : #9765953Falco
F
75

I've encountered batching problems with git ls-files | xargs wc -l when dealing with large numbers of files, where the line counts will get chunked out into multiple total lines.

Taking a tip from question Why does the wc utility generate multiple lines with "total"?, I've found the following command to bypass the issue:

wc -l $(git ls-files)

Or if you want to only examine some files, e.g. code:

wc -l $(git ls-files | grep '.*\.cs')

Function answered 30/7, 2013 at 6:3 Comment(8)
This is great but it seems to fail for paths which contain white spaces. Is there a way to solve that?Petal
Had trouble with grep '.*\.m' picking up binary files like .mp3, .mp4. Had more success with using the find command to list code files wc -l $(git ls-files | find *.m *.h)Murraymurre
@LeaHayes this is one way: wc -l --files0-from=<(git ls-files -z). The <(COMMAND) syntax returns the name of a file whose contents are the result of COMMAND.Efferent
@Efferent Thanks, but I am getting an error when I try that command 'cannot make pipe for process substitution: Function not implemented wc: unrecognized option --files0-from='. Any ideas?Petal
@LeaHayes What OS / terminal are you using? More importantly, what version of wc are you using? GNU wc works for me. You could try downloading that to get this working.Efferent
@Efferent the version which is included with the bash shell that is distributed with SourceTree for Windows. "wc (GNU textutils) 2.0".Petal
@LeaHayes I came up with this script which I think would work for you: ``` #!/bin/bash results=$(git ls-files | xargs -d '\n' wc -l) let grand_total=0 for x in $(echo "$results" | egrep '[[:digit:]]+ total$'); do let grand_total+=$(echo "$x" | awk '{print $1}') done echo "${results}" echo "grand total: ${grand_total}" ```Efferent
the -n switch with xargs can be used to increase the maximum number of lines within a chunkHilversum
A
29

This works as of cloc 1.68:

cloc --vcs=git

Adsorbate answered 11/5, 2017 at 19:31 Comment(2)
--vcs didn't work for me, maybe it was removed. cloc . while at the git repo did work, OTOH.Avulsion
--vcs=git worked for me on version v1.90 =) But yes I ran it at the root, it's just an option to tell cloc what it can ignoreSelhorst
S
25

I use the following:

git grep ^ | wc -l

This searches all files versioned by git for the regex ^, which represents the beginning of a line, so this command gives the total number of lines!

Sway answered 11/1, 2017 at 6:46 Comment(2)
This is concise and doesn't require any new software, and gives a fast count of textual lines (which is all the question really asks for). But it isn't a precise measure of executable code. It counts blank lines and comment lines, which are ignored by most of the purpose-built tools. (As an experiment I ran this on a small repo of utility code. git grep method: 5322; sloccount: 2942; cloc: 3251)Sympathin
@PaulBissex very true! Total lines is often what I want, but I've seen others modify this to git grep . | wc -l to only match lines containing at least one characterSway
Z
15

I was playing around with cmder (http://gooseberrycreative.com/cmder/) and I wanted to count the lines of html,css,java and javascript. While some of the answers above worked, or pattern in grep didn't - I found here (https://unix.stackexchange.com/questions/37313/how-do-i-grep-for-multiple-patterns) that I had to escape it

So this is what I use now:

git ls-files | grep "\(.html\|.css\|.js\|.java\)$" | xargs wc -l

Zymogenesis answered 22/7, 2015 at 1:0 Comment(1)
This seemed to respond with chunks for me. Using your grep in combination with Justin Aquadro's solution resulted well for me. wc -l $(git ls-files | grep "\(.html\|.css\|.js\|.php\|.json\|.sh\)$")Nava
G
5

I did this:

git ls-files | xargs file | grep "ASCII" | cut -d : -f 1 | xargs wc -l

this works if you count all text files in the repository as the files of interest. If some are considered documentation, etc, an exclusion filter can be added.

Gotthelf answered 21/11, 2015 at 20:54 Comment(0)
A
5

Try:

find . -type f -name '*.*' -exec wc -l {} + 

on the directory/directories in question

Averir answered 15/6, 2018 at 12:37 Comment(0)
P
5

If you want to get the number of lines from a certain author, try the following code:

git ls-files "*.java" | xargs -I{} git blame {} | grep ${your_name} | wc -l
Peckham answered 11/6, 2020 at 3:28 Comment(0)
C
3

This tool on github https://github.com/flosse/sloc can give the output in more descriptive way. It will Create stats of your source code:

  • physical lines
  • lines of code (source)
  • lines with comments
  • single-line comments
  • lines with block comments
  • lines mixed up with source and comments
  • empty lines
Coaly answered 4/1, 2016 at 8:0 Comment(0)
S
3

Depending on whether or not you want to include binary files, there are two solutions.

  1. git grep --cached -al '' | xargs -P 4 cat | wc -l
  2. git grep --cached -Il '' | xargs -P 4 cat | wc -l

    "xargs -P 4" means it can read the files using four parallel processes. This can be really helpful if you are scanning very large repositories. Depending on capacity of the machine you may increase number of processes.

    -a, process binary files as text (Include Binary)
    -l '', show only filenames instead of matching lines (Scan only non empty files)
    -I, don't match patterns in binary files (Exclude Binary)
    --cached, search in index instead of in the work tree (Include uncommitted files)

Scary answered 4/3, 2020 at 16:41 Comment(0)
B
3

If you want to find the total number of non-empty lines, you could use AWK:

git ls-files | xargs cat | awk '/\S/{x++} END{print "Total number of non-empty lines:", x}'

This uses regex to count the lines containing a non-whitespace character.

Bradstreet answered 4/7, 2020 at 20:24 Comment(0)
C
3

The answer by Carl Norum assumes there are no files with spaces, one of the characters of IFS with the others being tab and newline. The solution would be to terminate the line with a NULL byte.

 git ls-files -z | xargs -0 cat | wc -l
Coverley answered 18/11, 2020 at 11:20 Comment(0)
O
2
: | git mktree | git diff --shortstat --stdin

Or:

git ls-tree @ | sed '1i\\' | git mktree --batch | xargs | git diff-tree --shortstat --stdin
Origen answered 4/1, 2016 at 1:25 Comment(0)
G
1

From a Windows11 terminal:

wsl.exe /bin/bash -c "git ls-files .| xargs wc -mwl"

Where the . is your git repository

Output:

Lines count | Word count | Character count

Galimatias answered 17/8, 2023 at 22:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.