BASH: Count identical lines

Asked 31/10, 2017 at 18:33 Answered 31/10, 2017 at 23:45

Solved bash awk sed duplicates line-count

I have a file that contains:

VoicemailButtonTest
VoicemailButtonTest
VoicemailButtonTest
VoicemailButtonTest
VoicemailButtonTest
VoiceMailConfig60CharsTest
VoicemailDefaultTypeTest
VoiceMailIconSelectableTest
VoiceMailIconSelectableTest
VoiceMailIconSelectableTest
VoiceMailIconSelectableTest
VoiceMailIconSelectableTest
VoicemailSettingsFromMessageModeScreenTest
VoicemailSettingsFromMessageModeScreenTest
VoicemailSettingsTest
VoicemailSettingsTest
VoicemailSettingsTest
VoicemailSettingsTest
VoicemailSettingsTest
VoicemailSettingsTest
VoicemailSettingsTest

How do I replace the duplicate lines with counts:

VoicemailButtonTest (5)
VoiceMailConfig60CharsTest (1)
VoicemailDefaultTypeTest (1)
VoiceMailIconSelectableTest (5)
VoicemailSettingsFromMessageModeScreenTest (2)
VoicemailSettingsTest (7)

I placing the pair into an associative array. I tried using 'read' inside a 'while' statement, but the array gets lost. Here's my attempt:

unset line
tests=$(cat file.log)
echo "$tests" | 
    while read l; do 
        if [ "$l" == "${line}" ]; then
            let cnt++;
        else
            echo "${line} (${cnt})"
            line=${l}
            cnt=1
        fi
        export run_suites
    done

Lithometeor answered 31/10, 2017 at 18:33 Comment(3)

You're WAY off. See unix.stackexchange.com/questions/169716/… and google UUOC. Also never use the letter l as a variable name as it looks far to much like the number 1 and so obfuscates your code. – Retrenchment 31/10, 2017 at 18:50

it's pretty rude not to select an answer, please do so or state a reason the answers are not good enough – Soft 1/9, 2022 at 16:35

I do not know how to select an answer. – Lithometeor 25/3 at 19:41

Assuming the formatting of the output doesn't exactly have to match

VoicemailButtonTest (5)
VoiceMailConfig60CharsTest (1)
VoicemailDefaultTypeTest (1)
VoiceMailIconSelectableTest (5)
VoicemailSettingsFromMessageModeScreenTest (2)
VoicemailSettingsTest (7)

you can just use

sort <input_file> | uniq -c

If you need the output to exactly match what you posted, you can use

awk '{duplicates[$1]++} END{for (ind in duplicates) {print ind,"("duplicates[ind]")"}}' <input_file>

Edit: Posted just after anubhava's answer... but leaving (unless people suggest I delete) because of the addition of the sort command.

Libreville answered 31/10, 2017 at 18:44 Comment(1)

I'd leave it; I had the same thought about my own answer, you having beat me by 12 seconds. – Ribose 31/10, 2017 at 18:49

You can use this simple awk script to get counts:

awk '{freq[$1]++} END{for (i in freq) print i, "(" freq[i] ")"}' file

VoiceMailConfig60CharsTest (1)
VoicemailSettingsFromMessageModeScreenTest (2)
VoiceMailIconSelectableTest (5)
VoicemailButtonTest (5)
VoicemailDefaultTypeTest (1)
VoicemailSettingsTest (7)

If you want to maintain the order of appearance in input then use:

awk '!freq[$1]++{order[++k]=$1} END{
    for (i=1; i<=k; i++) print order[i], "(" freq[order[i]] ")"}' file

VoicemailButtonTest (5)
VoiceMailConfig60CharsTest (1)
VoicemailDefaultTypeTest (1)
VoiceMailIconSelectableTest (5)
VoicemailSettingsFromMessageModeScreenTest (2)
VoicemailSettingsTest (7)

Bucci answered 31/10, 2017 at 18:37 Comment(1)

Thanks for the good tip Ed. I forgot it is a builtin function in gnu-awk – Bucci 31/10, 2017 at 19:7

If you don't care about that exact output format, just use sort and uniq:

$ sort file.log | uniq -c
5 VoicemailButtonTest
1 VoiceMailConfig60CharsTest
1 VoicemailDefaultTypeTest
5 VoiceMailIconSelectableTest
2 VoicemailSettingsFromMessageModeScreenTest
7 VoicemailSettingsTest

sort, of course, is unnecessary if the file is already sorted as in your question. If it isn't sorted, uniq -c will still work, but it only considers a line to be a duplicate if it is identical to the immediately preceding line:

$ printf 'a\nb\na' | uniq -c
1 a
1 b
1 a

Ribose answered 31/10, 2017 at 18:44 Comment(0)

without awk keeping the order of the keys based on first appearance and doesn't require sorted or grouped input.

cat -n file    |     # add line numbers for order
sort -k2       |     # sort based on keys, ignoring line no
uniq -f1 -c    |     # count keys, ignoring line no
sort -k2,2n    |     # sort by line no to recover initial order
sed -r 's/(\S+)\s+(\S+)\s+(\S+)/\3 (\1)/'     # format output

Delindadelineate answered 31/10, 2017 at 19:36 Comment(0)

$ awk '$1 != prev{if (NR>1) print prev, "("cnt")"; prev=$1; cnt=0} {cnt++} END{print prev, "("cnt")"}' file
VoicemailButtonTest (5)
VoiceMailConfig60CharsTest (1)
VoicemailDefaultTypeTest (1)
VoiceMailIconSelectableTest (5)
VoicemailSettingsFromMessageModeScreenTest (2)
VoicemailSettingsTest (7)

The above retains your input order and stores almost nothing in memory, it doesn't care if your input is sorted or not, it just relies on all duplicate keys occurring contiguously in your input file like you showed in your example.

Retrenchment answered 31/10, 2017 at 18:58 Comment(0)

With bash array

unset tab
declare -A tab
while read line;do
  let tab["$line"]=${tab["$line"]}+1
done < infile
for i in ${!tab[*]} ;do
  echo "$i  (${tab[$i]})"
done | sort

Relent answered 31/10, 2017 at 23:45 Comment(0)

Recommended topics

Hot tags