Combine `cloc` with `git blame`
Asked Answered
C

1

21

cloc enables one to count the number of lines of code stored in a directory per language per type (blank, comment, or code).

git blame enables one to see which part of a file belong to whom.

I'm looking for a way to combine both so that one gets a (three dimensional) matrix that lists the lines of code per type per language per user.

Are there elegant builtin ways to do this or should one "scrap" the "blame" parts (by running grep after git blame) of each user and run cloc on them to calculate the table for each user?


EDIT:

Naive approach (based on the comment of @Jubobs):

  1. First generate a blame file for each file in the directory (not necessary explicit).
  2. Run grep with something like grep "^[^(]*([^)]*)" to capture the list of all users and retrieve the uniques with sort and uniq.
  3. For each user: generate a shadow copy of the folder and grep with grep "^[^(]*($user)" such that only the lines of that user remain.
  4. Run cloc on the shadow copy.
  5. Do this for each user, store the results and output them together.

This is more or less how to generate the desired output. But as one can see, this approach does a lot of copying (or at least storing in memory) and one can actually compute the lines per user by running over the file once instead of multiple times.


Desired output:

something like:

+--------+--------------------------------+--------------------------------+
|User    | C#                             | XML                            |
+--------+-------+-------+---------+------+-------+-------+---------+------+
|        | files | blank | comment | code | files | blank | comment | code |
+--------+-------+-------+---------+------+-------+-------+---------+------+
| Foo    |    12 |    75 |     148 | 2711 |     2 |    42 |       0 |    0 |
| Bar    |   167 |  1795 |    1425 |    2 |    16 |     0 |     512 | 1678 | 
+--------+-------+-------+---------+------+-------+-------+---------+------+
| Total  |   179 |  1870 |    1573 | 2713 |    18 |    42 |     512 | 1678 |
+--------+-------+-------+---------+------+-------+-------+---------+------+
Clove answered 25/8, 2014 at 11:18 Comment(11)
Could you edit your question to specify exactly what grep should retrieve from the output of git blame?Selfreliance
@Jubobs: see updated answer, but I'm wondering if there is a better way to do this instead of copying, filtering, searching, etc.Clove
@Jubobs: updated with expected output as well...Clove
Not sure why this is getting downvoted... Downvoter, an explanation?Selfreliance
Me neither, this is - in my opinion - a perfectly clear question, with example and not subjective, etc.Clove
This would look so cool if you actually create the 3D graphAliphatic
BTW, you do realize that git blame is just showing the last person to touch that line of code, even if someone else authored it. For example, if person 1 authors an entire file and then person 2 goes in and reformats the whole file (changes space to tabs or something like that) it may change most/all of the lines in the file and attribute all the lines to person 2. just FYI. not sure this is important to you.Cusack
@DavidN: I do realize that. Any heuristic will have it pros en cons. If you have a more robust metric, please share.Clove
It does not the split by languages, but git fame does some statistics.Sportscast
You always have excelent question and answers, I will upvote thoseTowhead
sir @WillemVanOnsem can you answer me in here, #71324755 in django views i am really got stucked?Fabian
R
3

This is an older question but it peaked my interest so I started playing around with trying to solve it. This doesn't spit out a nice report but it does put data in a csv with the 3 columns being: file extension, email of committer, # lines this user has committed for this file type. This also doesn't give the blank, comment, code lines like cloc does either. If I have time I'll try getting all of that to work nicely, but thought this might be a 'good enough' solution or at least get you started in the right direction.

#!/bin/bash

LIST_OF_GIT_FILES=/tmp/gitfiles.txt
GIT_BLAME_COMBINED_RESULTS=/tmp/git-blame.txt
OUTPUT=/tmp/git-blame-output.txt
SUMMARY=code-summary.csv

rm $GIT_BLAME_COMBINED_RESULTS
git ls-files > $LIST_OF_GIT_FILES
while read p; do
  git blame -e -f $p >> $GIT_BLAME_COMBINED_RESULTS
done < $LIST_OF_GIT_FILES
awk -F ' ' '{print $2 "," $3}' $GIT_BLAME_COMBINED_RESULTS | tr -d '(<>' | awk -F ',' '{n = split($1, a, "."); print a[n] "," $2}' > $OUTPUT
sort $OUTPUT | uniq -c | sort -n | awk -F ' ' '{print $2 "," $1}' | sort > $SUMMARY

rm $GIT_BLAME_COMBINED_RESULTS
rm $LIST_OF_GIT_FILES
rm $OUTPUT
Romina answered 9/6, 2016 at 12:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.