Shortest command to calculate the sum of a column of output on Unix?
Asked Answered
E

11

51

I'm sure there is a quick and easy way to calculate the sum of a column of values on Unix systems (using something like awk or xargs perhaps), but writing a shell script to parse the rows line by line is the only thing that comes to mind at the moment.

For example, what's the simplest way to modify the command below to compute and display the total for the SEGSZ column (70300)?

ipcs -mb | head -6
IPC status from /dev/kmem as of Mon Nov 17 08:58:17 2008
T         ID     KEY        MODE        OWNER     GROUP      SEGSZ
Shared Memory:
m          0 0x411c322e --rw-rw-rw-      root      root        348
m          1 0x4e0c0002 --rw-rw-rw-      root      root      61760
m          2 0x412013f5 --rw-rw-rw-      root      root       8192
Effective answered 17/11, 2008 at 15:5 Comment(0)
B
89
ipcs -mb | tail +4 | awk '{ sum += $7 } END { print sum }'

Or without tail:

ipcs -mb | awk 'NR > 3 { sum += $7 } END { print sum }'

Using awk with bc to have arbitrary long results (credits to Jouni K.):

ipcs -mb | awk 'NR > 3 { print $7 }' | paste -sd+ | bc
Barayon answered 17/11, 2008 at 15:14 Comment(6)
Thanks, very helpful! Running that command, I get this result: 6.59246e+08. Any way to force awk to display the exact value (rather than scientific notation)?Rojas
Andrew, there is a printf function for awk: gnu.org/software/gawk/manual/gawk.html#PrintfBarayon
printf "%d\n", sum should do it. (not %f i guess . dunno why i thought it's a floating point :p)Barayon
Also if you know it's always the last field but don't want to count fields (or if the number of fields varies) you can use print $NF.Annabelleannabergite
that is disturbing! but very cool... I [almost] want to yank my proposal :DMistook
the paste part is wrong on your 3rd solution. It should be paste -sd+ - (you forgot the ' -' at the end) which makes the complete command ipcs -mb | awk 'NR > 3 { print $7 }' | paste -sd+ - | bcPolitick
A
13

I would try to construct a calculation string and feed it to bc as follows:

  1. grep the lines that contain the numbers
  2. sed away all characters before (and after) the number on each line
  3. xargs the result (to get a string of numbers separated by blanks)
  4. tr anslate the blanks to '+' characters
  5. good appetite bc!

ipcs -mb | grep -w '^m ' | sed 's/^.*\s//' | xargs | tr ' ' + | bc

Looks like this is slightly longer than the awk solution, but for everyone who can't read (and understand) the odd awk code this may be easier to grasp... :-)

If bc is not installed you can use double parentheses in step 5 above to calculate the result:

  • echo $(( $(ipcs -mb | grep -w '^m ' | sed 's/^.*\s//' | xargs | tr ' ' +) )) or
  • SUM=$(( $(ipcs -mb | grep -w '^m ' | sed 's/^.*\s//' | xargs | tr ' ' +) )) or
  • (( SUM=$(ipcs -mb | grep -w '^m ' | sed 's/^.*\s//' | xargs | tr ' ' +) ))

The spacing after and before the double parentheses is optional.

Aldarcy answered 2/6, 2010 at 10:50 Comment(0)
O
4

I have a utility script which simply adds up all columns. It's usually easy enough to grab the one you want from the one-line output. As a bonus, some SI-suffixes are recognized.

#!/usr/bin/awk -f
# Sum up numerical values by column (white-space separated)
#
# Usage:  $0 [file ...]
#
# stern, 1999-2005

{
    for(i = 1; i <= NF; ++i) {
        scale = 1
        if ($i ~ /[kK]$/) { scale = 1000 }
        if ($i ~ /[mM]$/) { scale = 1000*1000 }
        if ($i ~ /[gG]$/) { scale = 1000*1000*1000 }
        col[i] += scale * $i;
    }
    if (NF > maxnf) maxnf = NF;
}

END {
    for(i = 1; i <= maxnf; ++i) { printf " %.10g", col[i] }
    print "";
}

Example with custom field separator:

$ head /etc/passwd | addcol -F:
0 0 45 39 0 0 0
Overlong answered 3/2, 2009 at 9:28 Comment(1)
# Usage: $0 [file ...] <- There is no "-F"... Can you clarify the usage? What flags are supported?Rojas
O
3

I know this question is somewhat dated, but I can't see "my" answer here, so I decided to post nonetheless. I'd go with a combination of

  • tail (to get the lines you need)
  • tr (to shrink down multiple consequitive spaces to one)
  • cut (to get only the needed column)
  • paste (to concatenate each line with a + sign)
  • bc (to do the actual calculation)

ipcs doesn't give an output on my system, so I'll just demo it with df:

# df
Filesystem     1K-blocks    Used Available Use% Mounted on
rootfs          33027952 4037420  27312812  13% /
udev               10240       0     10240   0% /dev
tmpfs             102108     108    102000   1% /run
/dev/xvda1      33027952 4037420  27312812  13% /
tmpfs               5120       0      5120   0% /run/lock
tmpfs             204200       0    204200   0% /run/shm
/dev/xvda1      33027952 4037420  27312812  13% /var/www/clients/client1/web1/log
/dev/xvda1      33027952 4037420  27312812  13% /var/www/clients/client1/web2/log
/dev/xvda1      33027952 4037420  27312812  13% /var/www/clients/client1/web3/log
/dev/xvda1      33027952 4037420  27312812  13% /var/www/clients/client1/web4/log
/dev/xvda1      33027952 4037420  27312812  13% /var/www/clients/client2/web5/log
/dev/xvda1      33027952 4037420  27312812  13% /var/www/clients/client2/web6/log
# df | tail -n +2 | tr -s ' ' | cut -d ' ' -f 2 | paste -s -d+ | bc
264545284

I know doing this particular calculation on my system doesn't really make sense, but it shows the concept.

All of the pieces of this solution have been shown in the other answers, but never in that combination.

Octant answered 16/2, 2015 at 22:42 Comment(0)
O
2

Python Solution

#!/usr/bin/env python
text= file("the_file","r")
total= 0
for line in text:
    data = line.split()
    if data[0] in ('T', 'Shared', 'IPC'): continue
    print line
    segsize= int(data[6])
    total += segsize
print total

Most Linux distros have Python.

If you want to process stdin as part of a pipline, use

import sys
total = 0
for line in sys.stdin:
   ...etc...

If you want to assume that there's always 3 header lines:

import sys
total = 0
for line in sys.stdin.readlines()[3:]:
    total += int(line.split()[6])
print total

One-liner:

import sys; print sum( [int(line.split()[6]) for line in sys.stdin.splitlines()[3:]] )
Oaf answered 17/11, 2008 at 15:14 Comment(0)
M
1

You could start by running the data through cut - which would at least trim the columns down.

You should then be able to pipe that into grep, stripping-out non-numerics.

Then ... well, then I'm not sure. It might be possible to pipe that to bc. If not, it could certainly be handed to a shell script to add each item.

If you used tr to change the newlines (\n) to spaces (), and piped that through xargs into your script that loops until there are no more inputs, adding each one, you may have an answer.

So, something akin to the following:

cat <whatever> | cut -d'\t` -f7 | grep -v <appropriate-character-class> | tr '\n' ' ' | xargs script-that-adds-arguments

I may have the cut flags slightly wrong - but man is your friend :)

Mistook answered 17/11, 2008 at 15:13 Comment(0)
B
1

You could look it up in any online awk reference:

ipcs | awk '
BEGIN { sum = 0 }
/0x000000/ { sum = sum + $2 }
END {print sum}'
Bleeder answered 17/11, 2008 at 15:28 Comment(0)
A
0

Thanks for the Python one-liner above!. It helped me to easy check the used space on my drive. Here is a mixed shell / Python one-liner, that do this - counts used space on the device /dev/sda in megabytes. It took me some time, before I found it out, so, maybe someone finds this useful too.

df -h -B 1M | grep dev/sda | tr -s ' '| cut -d' ' -f3 |python -c "import sys; print sum([int(num) for num in sys.stdin.readlines()])"

or more Python / less shell:

 df -h -B 1M | python -c "import sys; print sum([int(l.split()[2]) for l in sys.stdin.readlines() if '/dev/sda' in l])"

Thanks again!

Auberbach answered 13/3, 2009 at 15:14 Comment(0)
V
0

To sum values in a column you can use GNU datamash. Since the first four lines do not contain values you want to sum up, we remove them with tail +4.

ipcs -mb  | tail +4 | datamash -W sum 7

The -W option sets the field delimiter to (possibly multiple) whitespaces.

Vinous answered 26/10, 2018 at 13:39 Comment(0)
B
0

If you have specific, multiple columns you want to sum, you can use:

input_command | awk '{s1+=$1;s2+=$2;s3+=$3;s4+=$4;s5+=$5}END{print s1,s2,s3,s4,s5}'

which will work if you want to sum columns 1–5.

Betrothal answered 21/12, 2018 at 12:55 Comment(0)
C
0

Inconceivable how perl wasn't examplified!

See perldoc perlrun for -a (which implies -n). And perldoc perlvar to learn about $. and friends.

$ df |perl -aE'$.<2or$u+=$F[2]}{say"Used: $u"'
Used: 129016836

And if you really want to go crazy:

$ df -h |perl -anE'$|=1;
  BEGIN{%M=(""=>1,k=>1e3,K=>2**10,M=>2**20,G=>2**30,T=>2**40);%D=reverse%M}
  print;
  if($.<2){@V=map length(),/\s*+[^a-z]\S*(?:\s+[a-z]+)*/g;next} # parse header
  ($w=($_==$#V)+length($F[$_])-$V[$_])>0 and do{$V[$_]+=$w;$_<$#V and $V[$_+1]-=$w} for 0..$#F; # optimize column widths
  $S[$_]+=($F[$_]=~/^(\d+(?:[.]\d*)?)([kKMGT])?$/aa?$1*$M{($D||=$2)&&$2}:-Inf)for 0..$#F; # scale numeric values
}{ # show results
  say join("",map+("-"x($V[$_]-1)).($S[$_]<0?"^":"+"),0..$#V);
  $V[$_]+=$V[$_-1]for 1..$#V;
  if($D){for$s(@S){@s=sort{$b<=>$a}grep{$_<$s}keys%D and$s=sprintf"%.1f%s",$s/$s[0],$D{$s[0]}}}
  say sprintf+("%s%*s"x@S),map{((!$p||($_>0 and length($S[$_])>=($w=($V[$_]-$V[$_-1])))?(($q?"\n":(($p=$q=1)&&"")),$V[$_]):("",0+$w)),$S[$_])}grep{$S[$_]!=-Inf}0..$#S;
'
Crammer answered 22/8, 2020 at 14:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.