What's an easy way to read random line from a file in a shell script?
You can use shuf
:
shuf -n 1 $FILE
There is also a utility called rl
. In Debian it's in the randomize-lines
package that does exactly what you want, though not available in all distros. On its home page it actually recommends the use of shuf
instead (which didn't exist when it was created, I believe). shuf
is part of the GNU coreutils, rl
is not.
rl -c 1 $FILE
shuf
tip, it's built-in in Fedora. –
Unbiased r1
have any advantages? shuf
seams to work perfectly! –
Tannatannage sort -R
is definitely going to make one wait a lot if dealing with considerably huge files -- 80kk lines --, whereas, shuf -n
acts quite instantaneously. –
Aconcagua coreutils
from Homebrew. Might be called gshuf
instead of shuf
. –
Harmon randomize-lines
on OS X by brew install randomize-lines; rl -c 1 $FILE
–
Selfeducated r1
's randomize-lines states Users are recommended to use the shuf command instead which should be available by default. This package may be considered deprecated. Therefore, shuf
appears preferable. –
Ryon shuf
is part of GNU Coreutils and therefore won't necessarily be available (by default) on *BSD systems (or Mac?). @Tracker1's perl one-liner below is more portable (and by my tests, is slightly faster). –
Ryon shuf
and rl
make permutations of lines, not random draws. I.e. if you want to draw k random lines, you will want to run shuf -n 1
k times. This will draw from N^k possibilities instead of N!/(N-k)! possibilities, where N is the total number of lines. E.g., get 7 random lines from wordlist.txt: for n in {1..7}; do shuf -n1 wordlist.txt; done
–
Kaolin shuf
a file: shuf -n 1 <(echo -e "heads\ntails")
will randomly pick "heads" or "tails". Or just pipe to it: echo -e "heads\ntails" | shuf -n 1
–
Simplism Another alternative:
head -$((${RANDOM} % `wc -l < file` + 1)) file | tail -1
(${RANDOM} << 15) + ${RANDOM}
. This significantly reduces the bias and allows it to work for files containing up to 1 billion lines. –
Underwater +
and |
are the same since ${RANDOM}
is 0..32767 by definition. –
Underwater sort --random-sort $FILE | head -n 1
(I like the shuf approach above even better though - I didn't even know that existed and I would have never found that tool on my own)
sort
, didn't work on any of my systems (CentOS 5.5, Mac OS 10.7.2). Also, useless use of cat, could be reduced to sort --random-sort < $FILE | head -n 1
–
Affluent sort -R <<< $'1\n1\n2' | head -1
is as likely to return 1 and 2, because sort -R
sorts duplicate lines together. The same applies to sort -Ru
, because it removes duplicate lines. –
Yeseniayeshiva sort
before piping it to head
. shuf
selects random lines from the file, instead and is much faster for me. –
Neckpiece sort --random-sort $FILE | head
would be best, as it allows it to access the file directly, possibly enabling efficient parallel sorting –
Internationalist --random-sort
and -R
options are specific to GNU sort (so they won't work with BSD or Mac OS sort
). GNU sort learned those flags in 2005 so you need GNU coreutils 6.0 or newer (eg CentOS 6). –
Ingamar shuf
reads the whole file into memory. sort
may work even if the file does not fit in memory. –
Aretino This is simple.
cat file.txt | shuf -n 1
Granted this is just a tad slower than the "shuf -n 1 file.txt" on its own.
-n 1
specifies 1 line, and you can change it to more than 1. shuf
can be used for other things too; I just piped ps aux
and grep
with it to randomly kill processes partially matching a name. –
Absolute perlfaq5: How do I select a random line from a file? Here's a reservoir-sampling algorithm from the Camel Book:
perl -e 'srand; rand($.) < 1 && ($line = $_) while <>; print $line;' file
This has a significant advantage in space over reading the whole file in. You can find a proof of this method in The Art of Computer Programming, Volume 2, Section 3.4.2, by Donald E. Knuth.
shuf
. The perl code is very slightly faster (8% faster by user time, 24% faster by system time), though anecdotally I've found the perl code "seems" less random (I wrote a jukebox using it). –
Ryon shuf
stores the whole input file in memory, which is a horrible idea, while this code only stores one line, so the limit of this code is a line count of INT_MAX (2^31 or 2^63 depending on your arch), assuming any of its selected potential lines fits in memory. –
Ryon awk 'BEGIN{srand()}{rand()*NR<1&&l=$0}END{print l}' file
or some_input | awk 'BEGIN{srand()}{rand()*NR<1&&l=$0}END{print l}'
–
Lindsey using a bash script:
#!/bin/bash
# replace with file to read
FILE=tmp.txt
# count number of lines
NUM=$(wc - l < ${FILE})
# generate random number in range 0-NUM
let X=${RANDOM} % ${NUM} + 1
# extract X-th line
sed -n ${X}p ${FILE}
$FILE
. The curly braces are superfluous here. I recommend using lowercase or mixed-case variable names to avoid potential name collisions with shell or environment variables. –
Ramshackle wc - l
shouldn't have a space. –
Yeseniayeshiva Single bash line:
sed -n $((1+$RANDOM%`wc -l test.txt | cut -f 1 -d ' '`))p test.txt
Slight problem: duplicate filename.
wc -l < test.txt
avoids having to pipe to cut
. –
Minoan Here's a simple Python script that will do the job:
import random, sys
lines = open(sys.argv[1]).readlines()
print(lines[random.randrange(len(lines))])
Usage:
python randline.py file_to_get_random_line_from
import random, sys lines = open(sys.argv[1]).readlines()
for i in range(len(lines)): rand = random.randint(0, len(lines)-1) print lines.pop(rand), –
Channa len(lines)
may lead to IndexError. You could use print(random.choice(list(open(sys.argv[1]))))
. There is also memory efficient reservoir sampling algorithm. –
Aretino -l
assigns incoming lines to a list, l
. py
auto-imports stdlib modules. so you can do cat $FILE | py -l "random.choice(l)"
. Try it: python -m this | py -l "random.choice(l)"
... erm actually just py this | py -l "random.choice(l)"
;) –
Pulque Another way using 'awk'
awk NR==$((${RANDOM} % `wc -l < file.name` + 1)) file.name
$RANDOM
is a bashism). Here is a pure awk (mawk) method using the same logic as @Tracker1's cited perlfaq5 code above: awk 'rand() * NR < 1 { line = $0 } END { print line }' file.name
(wow, it's even shorter than the perl code!) –
Ryon wc
) in order to get a line count, then must read (part of) the file again (awk
) to get the content of the given random line number. I/O will be far more expensive than getting a random number. My code reads the file once only. The issue with awk's rand()
is that it seeds based on seconds, so you'll get duplicates if you run it consecutively too fast. –
Ryon A solution that also works on MacOSX, and should also works on Linux(?):
N=5
awk 'NR==FNR {lineN[$1]; next}(FNR in lineN)' <(jot -r $N 1 $(wc -l < $file)) $file
Where:
N
is the number of random lines you wantNR==FNR {lineN[$1]; next}(FNR in lineN) file1 file2
--> save line numbers written infile1
and then print corresponding line infile2
jot -r $N 1 $(wc -l < $file)
--> drawN
numbers randomly (-r
) in range(1, number_of_line_in_file)
withjot
. The process substitution<()
will make it look like a file for the interpreter, sofile1
in previous example.
Using only vanilla sed and awk, and without using $RANDOM, a simple, space-efficient and reasonably fast "one-liner" for selecting a single line pseudo-randomly from a file named FILENAME is as follows:
sed -n $(awk 'END {srand(); r=rand()*NR; if (r<NR) {sub(/\..*/,"",r); r++;}; print r}' FILENAME)p FILENAME
(This works even if FILENAME is empty, in which case no line is emitted.)
One possible advantage of this approach is that it only calls rand() once.
As pointed out by @AdamKatz in the comments, another possibility would be to call rand() for each line:
awk 'rand() * NR < 1 { line = $0 } END { print line }' FILENAME
(A simple proof of correctness can be given based on induction.)
Caveat about rand()
"In most awk implementations, including gawk, rand() starts generating numbers from the same starting number, or seed, each time you run awk."
-- https://www.gnu.org/software/gawk/manual/html_node/Numeric-Functions.html
#!/bin/bash
IFS=$'\n' wordsArray=($(<$1))
numWords=${#wordsArray[@]}
sizeOfNumWords=${#numWords}
while [ True ]
do
for ((i=0; i<$sizeOfNumWords; i++))
do
let ranNumArray[$i]=$(( ( $RANDOM % 10 ) + 1 ))-1
ranNumStr="$ranNumStr${ranNumArray[$i]}"
done
if [ $ranNumStr -le $numWords ]
then
break
fi
ranNumStr=""
done
noLeadZeroStr=$((10#$ranNumStr))
echo ${wordsArray[$noLeadZeroStr]}
Here is what I discovery since my Mac OS doesn't use all the easy answers. I used the jot command to generate a number since the $RANDOM variable solutions seems not to be very random in my test. When testing my solution I had a wide variance in the solutions provided in the output.
RANDOM1=`jot -r 1 1 235886`
#range of jot ( 1 235886 ) found from earlier wc -w /usr/share/dict/web2
echo $RANDOM1
head -n $RANDOM1 /usr/share/dict/web2 | tail -n 1
The echo of the variable is to get a visual of the generated random number.
© 2022 - 2024 — McMap. All rights reserved.