I am processing a number of large text files, ie. converting them all from one format to another. There are some small differences in the original formats of the files, but - with a bit of pre-processing in a few cases - they are mostly being successfully converted with a bash shellscript I have created.
So far so good, but one thing is puzzling me. At one point the script sets a variable called $iterations
, so that it knows how many times to perform a particular for-loop. This value is determined by the number of empty lines in a temporary file that is created by the script.
Thus, the original version of my script contained the line:
iterations=$(cat tempfile | grep '^$' | wc -l)
This has worked fine so far with all but one of the text files, which didn't seem to set the $iterations
variable correctly, giving a value of '1' even though there appeared to be more than 20,000 empty lines in tempfile
.
However, having discovered grep -c
, I changed the line to:
iterations=$(cat tempfile | grep -c '^$')
and the script suddenly worked, ie. $iterations
was set correctly.
Can anyone explain why the two versions produce different results? And why the first version would work on some files and not others? Is there some upper limit value above which wc -l
defaults to 1? The file which wouldn't work with the first version is one of the largest, but not the largest in the set (which converted correctly the first time).
grep -c '^$'
produces output different thangrep '^$' | wc -l
? – Eleneeleniwc
, wouldcat tempfile | grep '^$' | hexdump -C | head
produce anything interesting? – Patricioprintf 'foo\nbar\n\x00\n\n\n\n' | { cat > /tmp/file; grep -c '^$' < /tmp/file; grep '^$' < /tmp/file | wc -l; }
Dmitri's got it. With a null character,wc
produces1
, whilegrep -c
counts 4. – Eleneelenigrep
is printingBinary file (standard input) matches
, and wc is counting that line! – Eleneeleni