How does "wc -w < file.txt" work?
Asked Answered
S

3

5

I was trying to get only the number of words in a file using wc. wc -w file.txt gives me that plus the file name. I don't want the file name. So, I saw that wc -w < file.txt works.

I don't understand how this command works. I cannot even add a comment below the answer where I saw this command.

Why does it not print filename in the case of wc -w < file.txt?

Specs answered 5/8, 2013 at 21:47 Comment(0)
R
5

wc -w will output the word count and file name for each of its arguments. So the command wc -w myfile.txt will give you something like:

42 myfile.txt

However, where wc doesn't know the filename, it simply outputs the count. You can hide the file name from it by using input redirection as wc is one of those commands that will read standard input if you don't explicitly name a file. This can be done with:

wc -w <myfile.txt

or:

cat myfile.txt | wc -w

although the latter is inefficient in this case and better where the input is more complex than just the contents of some file.

The reason that wc doesn't know the file name in wc -w <myfile.txt is because the shell opens the file and attaches it to the standard input of the wc process. The shell also never passes the <myfile.txt along to the wc program's command line arguments. Many programs like wc have code along the lines of:

FILE * infile;
if (argc == 0)
    infile = stdin;
else
    infile = fopen (argv[1]], "r");

to select either standard input or open a known file name. In the former case, no name is available to the program.

Rollway answered 5/8, 2013 at 21:55 Comment(3)
Please see my edit. Sorry that I did not explain my problem clearly. I don't understand why wc does not know the filename. It is being told to read from file using < file.txt, right ?Specs
@blasto no, the < tells your shell to connect stdin to the file before starting wc; wc doesn't know where its stdin is coming from, because that was done by the shell beforehand.Kaenel
The shell does not "open the file and send the contents". The shell arranges it so that the file is the stdin of the process. "sends the contents" implies a data copy that does not happen.Gemmell
D
4

This works because when you use < the shell is sourcing the contents of "file.txt" to the standard input of wc.

Also, note that wc -l will give you the number of lines in file.txt. If you really want the number of words irrespective on which line they are, you should use wc -w instead.

When you used wc -w file.txt it gave you the number of words inside "file.txt" and the filename because wc accepts many files as possible inputs, so it outputs the filename after the word count for you to know that file has those many words. For example, suppose you have these two files with their respective contents:

twoword.txt: foo bar

threeword.txt: foo bar test

then wc -w twoword.txt threeword.txt would give you:

2 twoword.txt
3 threeword.txt
5 total

thus showing how many words are there in each file then summing up them for total.

Delorsedelos answered 5/8, 2013 at 21:49 Comment(7)
Sorry to be a nitpick, but you're not telling wc anything, you're telling the shell to take the contents of file.txt and shove it in stdin (i.e. file descriptor 0) of wc.Guimar
Please see my edit. Sorry that I did not explain my problem clearly.Specs
@Guimar - that is what i was looking for. I don't know these internals of unix. I only try to make scripts to do tasks. Consider putting your comment as an answer.Specs
@blasto have a look at the other answers, they address exactly this topic.Guimar
Yeah @mvds, technically you are correct. I'll edit that part.Delorsedelos
@Guimar Not quite. The shell is not 'shov[ing]' data to stdin. The shell is duping fd 0 of the child so that the file is stdin. There is no additional data copy or movement of the data in the file. The shell does not "pipe" the data to stdin. The shell makes it so that stdin of the process is the file.Gemmell
@WilliamPursell I'd say it would dup2 the fd? Anyway, I suspect this is implementation defined, as long as the child process reads the file data from fd 0 one way or another?Guimar
P
3

In your example, when the shell creates a process for wc, it opens file.txt with the file descriptor 0 for stdin. Read up on what stdin, stout and stderr are here: Confused about stdin, stdout, and stderr

The man page for wc defines the behavior with no file parameter.

wc is reading the contents of file.txt from stdin and printing the number of words. It doesn't print the file name because it doesn't know what the file name is when it reads from stdin.

Plait answered 5/8, 2013 at 21:51 Comment(1)
The contents of file.txt are not "sent to stdin". The file is stdin of the process. "Sent to" implies a data copy that does not happen.Gemmell

© 2022 - 2024 — McMap. All rights reserved.