UPDATE :
gnu-parallel
benchmarking pre-made file over -pipe-part
:
(parallel --pipe-part --argfile "${DT}/temptestpipepartinput.txt" | gpaste )
Exactly like command above: 61.57s user 76.92s system 424% cpu
32.609 total
-j 2
27.883 total
-j 4
21.850 total
-j 6
21.221 total <—- min point (didn't check 5
or 7
)
-j 8
25.133 total
-j 10
30.734 total
-j 12
36.279 total
Using the pre-made file
mawk1.9.9.6
:: 6.953 secs
using its own file I/O
, and 7.128 secs
piped-in.
perl 5.36.1
:: 8.786 secs
using its own file I/O
, and 8.925 secs
piped in.
python3.11.5
:: here's the strange beast - apparently summing via int(_)
instead of float(_)
is a 17.98 %
slow down penalty:
8.468 secs
python3 -c 'import sys; print(int(sum((float(_) for _ in sys.stdin))))'
9.991 secs
python3 -c 'import sys; print(int(sum(( int(_) for _ in sys.stdin))))'
Side note: this set of integers created a file with perfect digit uniformity when it came to stats from gnu-wc
:
99,999,999 888,888,888 888,888,888
A perfect chain of eight 9
s for row count, and chain of nine 8
s for byte count. The digits-only count after backing out all the \n(ewlines)
:
788,888,889
In awk
, just getting a 2nd column with cumulative sum is far less syntax than saving it towards the end:
jot 20 61111111889 - 799973766543 |
mawk '$2=_+=$1' # skips rows with zero(0) as its value
gawk '($2=_+=$1)_' # no rows left behind
61111111889 61111111889
861084878432 922195990321
1661058644975 2583254635296
2461032411518 5044287046814
3261006178061 8305293224875
4060979944604 12366273169479
4860953711147 17227226880626
5660927477690 22888154358316
6460901244233 29349055602549
7260875010776 36609930613325
8060848777319 44670779390644
8860822543862 53531601934506
9660796310405 63192398244911
10460770076948 73653168321859
11260743843491 84913912165350
12060717610034 96974629775384
12860691376577 109835321151961
13660665143120 123495986295081
14460638909663 137956625204744
15260612676206 153217237880950
For all practical purposes, perl5
python3
and mawk2
are tied for speed summing up from 1
to 99,999,999
::
(echo '99999999' | mawk2 '$++NF = (__=+$++_)*++__/++_'
99999999 4999999950000000
(All input digits were re-generated on the fly and piped in to eliminate any potential cache access advantage):
in0: 847MiB 0:00:10 [81.1MiB/s] [81.1MiB/s] [ <=> ]
1 4999999950000000
(python3 -c 'import sys; print(int(sum((float(_) for _ in sys.stdin))))')
19.14s user 0.55s system 188% cpu 10.473 total
gcat -b 0.00s user 0.00s system 0% cpu 10.473 total
in0: 847MiB 0:00:10 [81.0MiB/s] [81.0MiB/s] [ <=> ]
1 4999999950000000
(perl536 -nle '$sum += $_ } END { print $sum')
19.37s user 0.55s system 190% cpu 10.472 total
gcat -b 0.00s user 0.00s system 0% cpu 10.472 total
in0: 847MiB 0:00:10 [81.1MiB/s] [81.1MiB/s] [ <=>]
1 4999999950000000
(mawk1996 '{ _+=$__ } END { print _ }')
17.51s user 0.57s system 172% cpu 10.463 total
gcat -b 0.00s user 0.00s system 0% cpu 10.463 total
However, once you eliminate the pipe and hashing speed factors and ask them to sum it among itself, perl5.36
is some 52% slower
:
( time (
mawk2 'BEGIN { for(___=_-=_=__=((_+=++_)+(_*=_+_))^_; ++_<__;)___+=_
print ___ }'
) | gcat -b ) | lgp3 ;
( time (
perl5 -e '$y = $x = 0; $z = 10**8; while(++$x < $z) { $y += $x } print $y'
) | gcat -b ) | lgp3 ;
1 4999999950000000
( mawk2 ; ) 1.97s user 0.01s system 99% cpu 1.981 total
gcat -b 0.00s user 0.00s system 0% cpu 1.979 total
( perl5 -e '$y = $x = 0; $z = 10**8; while(++$x < $z) { $y += $x } print $y'; 2.98s user 0.03s system 99% cpu 3.015 total
gcat -b 0.00s user 0.00s system 0% cpu 3.014 total
1 4999999950000000
As for gnu-parallel
, they're more than half an order of magnitude slower
- 36 concurrent jobs with 5,000,000 rows per job and very generous 100 MB size upper cap running on M1 Max with 64 GB ram and it still took nearly 53 seconds compare to about
10.5 secs
for the other 3.
( time ( mawk2 'BEGIN { for(_-=_=__=((_+=++_)+(_*=_+_))^_; ++_ < __; ) print _ }' |
pvE0 |
parallel --block 100M -N 5000000 -j 36 --pipe "gpaste -sd+ - | bc" | gpaste -sd+ - | bc
) | gcat -b ) | lgp3 | lgp3 -1;
in0: 847MiB 0:00:47 [17.8MiB/s] [17.8MiB/s] [ <=> ]
1 4999999950000000
0.00s user 0.00s system 0% cpu 52.895 total
======================
reference code for massively loop unrolled summations (this variant is 512 numbers per while()
-loop round :
( gawk -p- -be "${DT}/temptestpipepartinput.txt"; )
8.50s user 1.46s system 99% cpu 9.970 total
1 4999999950000000
2 # gawk profile, created Sat Oct 21 04:25:20 2023
3 # BEGIN rule(s)
4 BEGIN {
5 1 CONVFMT="%.250g"
6 1 FS=RS
7 1 RS="^$"
8 }
9 # END rule(s)
10 END {
11 1 print ______()
12 }
13 # Functions, listed alphabetically
14 1 function ______(_, __, ___)
15 {
16 1 ___=(__=_=_<_)+NF
17 196079 while (_<___)
18 __ += $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
19 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
20 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
21 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
22 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
23 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
24 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
25 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
26 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
27 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
28 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
29 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
30 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
31 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
32 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
33 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
34 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
35 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
36 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
37 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
38 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
39 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
40 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
41 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
42 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
43 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
44 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
45 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
46 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
47 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
48 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
49 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
50 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
51 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
52 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
53 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
54 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
55 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
56 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
57 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
58 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
59 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
60 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
61 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
62 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
63 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
64 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
65 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
66 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
67 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
68 + $++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_+$++_
69 + $++_+$++_
71 return __
73 }
awk
andbc
). These all finished adding a million numbers up in less than 10 seconds. Take a look at those and see how it can be done in pure shell. – Gotcher