What's an easy way to read random line from a file?

Asked 15/1, 2009 at 19:1 Answered 23/8, 2017 at 7:41

shell random

270

What's an easy way to read random line from a file in a shell script?

Loosetongued answered 15/1, 2009 at 19:1 Comment(3)

Is each line padded to a fixed length? – Florencia 15/1, 2009 at 19:3

no, each line has variable number of characters – Loosetongued 15/1, 2009 at 19:4

large file: #29103089 – Busterbustle 20/11, 2015 at 9:58

395

You can use shuf:

shuf -n 1 $FILE

There is also a utility called rl. In Debian it's in the randomize-lines package that does exactly what you want, though not available in all distros. On its home page it actually recommends the use of shuf instead (which didn't exist when it was created, I believe). shuf is part of the GNU coreutils, rl is not.

rl -c 1 $FILE

Kurtzig answered 15/1, 2009 at 19:30 Comment(17)

i really like that shuf approach! – Cobos 15/1, 2009 at 19:39

Thanks for the shuf tip, it's built-in in Fedora. – Unbiased 2/12, 2010 at 2:52

Does this r1 have any advantages? shuf seams to work perfectly! – Tannatannage 10/6, 2011 at 15:46

shuf is great as a drop-in replacement for head command, good to know – Coquelicot 10/6, 2013 at 7:45

Andalso, sort -R is definitely going to make one wait a lot if dealing with considerably huge files -- 80kk lines --, whereas, shuf -n acts quite instantaneously. – Aconcagua 18/6, 2013 at 6:56

You can get shuf on OS X by installing coreutils from Homebrew. Might be called gshuf instead of shuf. – Harmon 27/12, 2013 at 22:27

Similarly, you can use randomize-lines on OS X by brew install randomize-lines; rl -c 1 $FILE – Selfeducated 9/4, 2014 at 18:3

@Rubens: the same question – Aretino 24/9, 2014 at 18:50

@J.F.Sebastian: the same answer – Aconcagua 24/9, 2014 at 21:13

@ThomasAhle, the Debian package summary for r1's randomize-lines states Users are recommended to use the shuf command instead which should be available by default. This package may be considered deprecated. Therefore, shuf appears preferable. – Ryon 17/12, 2014 at 21:50

Note that shuf is part of GNU Coreutils and therefore won't necessarily be available (by default) on *BSD systems (or Mac?). @Tracker1's perl one-liner below is more portable (and by my tests, is slightly faster). – Ryon 19/12, 2014 at 21:49

Why is this answer in the bottom though it has the most upvotes? – Nicks 7/3, 2015 at 6:55

@Nicks are you sorting by age? – Twodimensional 8/7, 2015 at 16:27

This is a cool command! Yet another wheel I've reinvented not knowing it already exists in my flavor of Unix! Thank you! – Feucht 8/7, 2016 at 14:23

though this is not suitable for huge files... I'm getting a 'shuf: read error: Cannot allocate memory' on a 70GB file – Helse 7/10, 2016 at 0:30

This is a great answer. I would just like to point out that in case more than 1 line is needed, shuf and rl make permutations of lines, not random draws. I.e. if you want to draw k random lines, you will want to run shuf -n 1 k times. This will draw from N^k possibilities instead of N!/(N-k)! possibilities, where N is the total number of lines. E.g., get 7 random lines from wordlist.txt: for n in {1..7}; do shuf -n1 wordlist.txt; done – Kaolin 9/3, 2017 at 4:19

you can use process substitution if you don't want to give shuf a file: shuf -n 1 <(echo -e "heads\ntails") will randomly pick "heads" or "tails". Or just pipe to it: echo -e "heads\ntails" | shuf -n 1 – Simplism 10/10, 2022 at 18:23

Another alternative:

head -$((${RANDOM} % `wc -l < file` + 1)) file | tail -1

Hardman answered 16/1, 2009 at 8:54 Comment(6)

${RANDOM} only generates numbers less than 32768, so don't use this for large files (for example the English dictionary). – Exiguous 13/3, 2012 at 20:16

This does not give you the precise same probability for every line, due to the modulo operation. This does barely matter if the file length is << 32768 (and not at all if it divides that number), but maybe worth noting. – Organzine 21/3, 2014 at 17:58

You can extend this to 30-bit random numbers by using (${RANDOM} << 15) + ${RANDOM}. This significantly reduces the bias and allows it to work for files containing up to 1 billion lines. – Underwater 19/6, 2015 at 5:42

@nneonneo: Very cool trick, though according to this link it should be OR'ing the ${RANDOM}'s instead of PLUS'ing https://mcmap.net/q/95272/-random-number-from-a-range-in-a-bash-script – Witte 12/7, 2015 at 1:54

+ and | are the same since ${RANDOM} is 0..32767 by definition. – Underwater 12/7, 2015 at 7:12

There's a heavy performance penalty to this, since it needs to count lines to be sure it's reading to the right point. – Callicrates 19/3, 2018 at 22:35

sort --random-sort $FILE | head -n 1

(I like the shuf approach above even better though - I didn't even know that existed and I would have never found that tool on my own)

Scene answered 10/11, 2010 at 12:28 Comment(8)

+1 I like it, but you may need a very recent sort, didn't work on any of my systems (CentOS 5.5, Mac OS 10.7.2). Also, useless use of cat, could be reduced to sort --random-sort < $FILE | head -n 1 – Affluent 16/2, 2012 at 19:2

sort -R <<< $'1\n1\n2' | head -1 is as likely to return 1 and 2, because sort -R sorts duplicate lines together. The same applies to sort -Ru, because it removes duplicate lines. – Yeseniayeshiva 15/9, 2012 at 11:3

This is relatively slow, since the whole file needs to get shuffled by sort before piping it to head. shuf selects random lines from the file, instead and is much faster for me. – Neckpiece 25/11, 2012 at 17:33

@SteveKehlet while we're at it, sort --random-sort $FILE | head would be best, as it allows it to access the file directly, possibly enabling efficient parallel sorting – Internationalist 6/6, 2014 at 18:22

@Internationalist Good improvement! – Affluent 9/6, 2014 at 16:13

The --random-sort and -R options are specific to GNU sort (so they won't work with BSD or Mac OS sort). GNU sort learned those flags in 2005 so you need GNU coreutils 6.0 or newer (eg CentOS 6). – Ingamar 9/4, 2015 at 7:9

from Wikipedia: "this is not a full random shuffle because it will sort identical lines together" – Wessels 14/4, 2015 at 10:58

@Bengt: nothing is written until shuf reads the whole file into memory. sort may work even if the file does not fit in memory. – Aretino 26/9, 2015 at 0:59

This is simple.

cat file.txt | shuf -n 1

Granted this is just a tad slower than the "shuf -n 1 file.txt" on its own.

Irretrievable answered 23/5, 2016 at 7:1 Comment(1)

Best answer. I didn't know about this command. Note that -n 1 specifies 1 line, and you can change it to more than 1. shuf can be used for other things too; I just piped ps aux and grep with it to randomly kill processes partially matching a name. – Absolute 18/1, 2017 at 22:53

perlfaq5: How do I select a random line from a file? Here's a reservoir-sampling algorithm from the Camel Book:

perl -e 'srand; rand($.) < 1 && ($line = $_) while <>; print $line;' file

This has a significant advantage in space over reading the whole file in. You can find a proof of this method in The Art of Computer Programming, Volume 2, Section 3.4.2, by Donald E. Knuth.

Florencia answered 15/1, 2009 at 19:6 Comment(6)

Just for the purposes of inclusion (in case the referred site goes down), here's the code that Tracker1 pointed to: "cat filename | perl -e 'while (<>) { push(@_,$_); } print @_[rand()*@_];';" – Urbai 15/1, 2009 at 19:16

This is a useless use of cat. Here's a slight modification of the code found in perlfaq5 (and courtesy of the Camel book): perl -e 'srand; rand($.) < 1 && ($line = $_) while <>; print $line;' filename – Quantitative 15/1, 2009 at 21:55

err... the linked site, that is – Gathering 22/5, 2009 at 4:48

I just benchmarked an N-lines version of this code against shuf. The perl code is very slightly faster (8% faster by user time, 24% faster by system time), though anecdotally I've found the perl code "seems" less random (I wrote a jukebox using it). – Ryon 17/12, 2014 at 21:59

More food for thought: shuf stores the whole input file in memory, which is a horrible idea, while this code only stores one line, so the limit of this code is a line count of INT_MAX (2^31 or 2^63 depending on your arch), assuming any of its selected potential lines fits in memory. – Ryon 19/12, 2014 at 21:58

here's the awk equivalent. either of these answers (perl or awk) are better than the accepted for - portability, speed, and ability to manage huge files easily. awk 'BEGIN{srand()}{rand()*NR<1&&l=$0}END{print l}' file or some_input | awk 'BEGIN{srand()}{rand()*NR<1&&l=$0}END{print l}' – Lindsey 19/4, 2020 at 17:18

using a bash script:

#!/bin/bash
# replace with file to read
FILE=tmp.txt
# count number of lines
NUM=$(wc - l < ${FILE})
# generate random number in range 0-NUM
let X=${RANDOM} % ${NUM} + 1
# extract X-th line
sed -n ${X}p ${FILE}

Adlai answered 15/1, 2009 at 19:12 Comment(10)

Random can be 0, sed needs 1 for the first line. sed -n 0p returns error. – Shani 15/1, 2009 at 19:20

mhm - how about $1 for "tmp.txt" and $2 for NUM ? – Gwenn 15/1, 2009 at 19:22

but even with the bug worth a point, as it does not need perl or python and is as efficient as you can get (reading the file exactly twice but not into memory - so it would work even with huge files). – Gwenn 15/1, 2009 at 19:28

@asalamon74: thanks @blabla999: if we make a function out of it, ok for $1, but why not computing NUM? – Adlai 15/1, 2009 at 19:28

Changing the sed line to: head -${X} ${FILE} | tail -1 should do it – Nitrobenzene 15/1, 2009 at 19:34

useless use of cat detected, wc happily takes files directly – Sensuous 15/1, 2009 at 21:0

@Hasturkun: beware - the output of wc depends on whether it reads stdin or a file name off its command line. Granted, 'wc -l < $FILE' would be OK; using 'wc -l $FILE' (no redirection) would be a bug. – Fisticuffs 16/1, 2009 at 8:6

@Sensuous & J.Leffler: the cat was meant to avoid wc printing the file name. Fixed with the 'wc -l < $FILE' suggestion, thanks – Adlai 16/1, 2009 at 8:26

The variable names should be quoted, especially $FILE. The curly braces are superfluous here. I recommend using lowercase or mixed-case variable names to avoid potential name collisions with shell or environment variables. – Ramshackle 28/10, 2011 at 14:22

If a file has 32769 or more lines, the last ones are never selected. wc - l shouldn't have a space. – Yeseniayeshiva 15/9, 2012 at 11:12

Single bash line:

sed -n $((1+$RANDOM%`wc -l test.txt | cut -f 1 -d ' '`))p test.txt

Slight problem: duplicate filename.

Shani answered 15/1, 2009 at 19:17 Comment(2)

slighter problem. performing this on /usr/share/dict/words tends to favor words starting with "A". Playing with it, I'm at about 90% "A" words to 10% "B" words. None starting with numbers yet, which make up the head of the file. – Phosphatase 30/9, 2010 at 5:1

wc -l < test.txt avoids having to pipe to cut. – Minoan 11/5, 2015 at 17:56

Here's a simple Python script that will do the job:

import random, sys
lines = open(sys.argv[1]).readlines()
print(lines[random.randrange(len(lines))])

Usage:

python randline.py file_to_get_random_line_from

Hairy answered 15/1, 2009 at 19:7 Comment(6)

This doesn't quite work. It stops after a single line. To make it work, I did this: import random, sys lines = open(sys.argv[1]).readlines() for i in range(len(lines)): rand = random.randint(0, len(lines)-1) print lines.pop(rand), – Channa 14/1, 2011 at 20:13

Stupid comment system with crappy formatting. Didn't formatting in comments work once upon a time? – Channa 14/1, 2011 at 20:14

randint is inclusive therefore len(lines) may lead to IndexError. You could use print(random.choice(list(open(sys.argv[1])))). There is also memory efficient reservoir sampling algorithm. – Aretino 24/9, 2014 at 19:8

Quite space hungry; consider a 3TB file. – Masquerade 27/5, 2015 at 15:43

@MichaelCampbell: reservoir sampling algorithm that I've mentioned above may work with 3TB file (if line size is limited). – Aretino 26/9, 2015 at 1:2

Using py is nice. -l assigns incoming lines to a list, l. py auto-imports stdlib modules. so you can do cat $FILE | py -l "random.choice(l)". Try it: python -m this | py -l "random.choice(l)" ... erm actually just py this | py -l "random.choice(l)" ;) – Pulque 5/1, 2016 at 21:23

Another way using 'awk'

awk NR==$((${RANDOM} % `wc -l < file.name` + 1)) file.name

Earthlight answered 4/9, 2013 at 6:43 Comment(2)

That uses awk and bash ($RANDOM is a bashism). Here is a pure awk (mawk) method using the same logic as @Tracker1's cited perlfaq5 code above: awk 'rand() * NR < 1 { line = $0 } END { print line }' file.name (wow, it's even shorter than the perl code!) – Ryon 19/12, 2014 at 21:33

That code must read the file (wc) in order to get a line count, then must read (part of) the file again (awk) to get the content of the given random line number. I/O will be far more expensive than getting a random number. My code reads the file once only. The issue with awk's rand() is that it seeds based on seconds, so you'll get duplicates if you run it consecutively too fast. – Ryon 19/12, 2014 at 21:41

A solution that also works on MacOSX, and should also works on Linux(?):

N=5
awk 'NR==FNR {lineN[$1]; next}(FNR in lineN)' <(jot -r $N 1 $(wc -l < $file)) $file

Where:

N is the number of random lines you want
NR==FNR {lineN[$1]; next}(FNR in lineN) file1 file2 --> save line numbers written in file1 and then print corresponding line in file2
jot -r $N 1 $(wc -l < $file) --> draw N numbers randomly (-r) in range (1, number_of_line_in_file) with jot. The process substitution <() will make it look like a file for the interpreter, so file1 in previous example.

Moldavia answered 17/8, 2015 at 9:10 Comment(0)

Using only vanilla sed and awk, and without using $RANDOM, a simple, space-efficient and reasonably fast "one-liner" for selecting a single line pseudo-randomly from a file named FILENAME is as follows:

sed -n $(awk 'END {srand(); r=rand()*NR; if (r<NR) {sub(/\..*/,"",r); r++;}; print r}' FILENAME)p FILENAME

(This works even if FILENAME is empty, in which case no line is emitted.)

One possible advantage of this approach is that it only calls rand() once.

As pointed out by @AdamKatz in the comments, another possibility would be to call rand() for each line:

awk 'rand() * NR < 1 { line = $0 } END { print line }' FILENAME

(A simple proof of correctness can be given based on induction.)

Caveat about `rand()`

"In most awk implementations, including gawk, rand() starts generating numbers from the same starting number, or seed, each time you run awk."

-- https://www.gnu.org/software/gawk/manual/html_node/Numeric-Functions.html

Camenae answered 14/12, 2015 at 21:43 Comment(1)

See the comment I posted a year before this answer, which has a simpler awk solution that doesn't require sed. Also note my caveat about awk's random number generator, which seeds at whole seconds. – Ryon 19/3, 2018 at 18:40

#!/bin/bash

IFS=$'\n' wordsArray=($(<$1))

numWords=${#wordsArray[@]}
sizeOfNumWords=${#numWords}

while [ True ]
do
    for ((i=0; i<$sizeOfNumWords; i++))
    do
        let ranNumArray[$i]=$(( ( $RANDOM % 10 )  + 1 ))-1
        ranNumStr="$ranNumStr${ranNumArray[$i]}"
    done
    if [ $ranNumStr -le $numWords ]
    then
        break
    fi
    ranNumStr=""
done

noLeadZeroStr=$((10#$ranNumStr))
echo ${wordsArray[$noLeadZeroStr]}

Lucey answered 15/6, 2017 at 13:0 Comment(1)

Since $RANDOM generates numbers less than the number of words in /usr/share/dict/words, which has 235886 (on my Mac anyway), I just generate 6 separate random numbers between 0 and 9 and string them together. Then I make sure that number is less than 235886. Then remove leading zeros to index the words that I stored in the array. Since each word is its own line this could easily be used for any file to randomly pick a line. – Lucey 15/6, 2017 at 13:1

Here is what I discovery since my Mac OS doesn't use all the easy answers. I used the jot command to generate a number since the $RANDOM variable solutions seems not to be very random in my test. When testing my solution I had a wide variance in the solutions provided in the output.

  RANDOM1=`jot -r 1 1 235886`
   #range of jot ( 1 235886 ) found from earlier wc -w /usr/share/dict/web2
   echo $RANDOM1
   head -n $RANDOM1 /usr/share/dict/web2 | tail -n 1

The echo of the variable is to get a visual of the generated random number.

Mildred answered 23/8, 2017 at 7:41 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Caveat about rand()

Recommended topics

Hot tags

Caveat about `rand()`