Useless use of cat?
Asked Answered
T

9

153

This is probably in many FAQs - instead of using:

cat file | command

(which is called useless use of cat), correct way supposed to be:

command < file

In the 2nd, "correct" way - OS does not have to spawn an extra process.
Despite knowing that, I continued to use useless cat for 2 reasons.

  1. more aesthetic - I like when data moves uniformly only from left to right. And it easier to replace cat with something else (gzcat, echo, ...), add a 2nd file or insert new filter (pv, mbuffer, grep ...).

  2. I "felt" that it might be faster in some cases. Faster because there are 2 processes, 1st (cat) does the reading and the second does whatever. And they can run in parallel, which means sometimes faster execution.

Is my logic correct (for 2nd reason)?

Teenyweeny answered 29/7, 2012 at 15:46 Comment(9)
cat is an identity pipe. It only streams its input to its output. If the second program in the chain can take its input from the same argument you pass to cat (or from the standard input, if you pass no argument), then cat is absolutely useless and only results in an additional process being forked and an additional pipe being created.Marine
@FrédéricHamidi when cat has no arguments or its argument is -, it's an identity pipe. When it has more than one non-dash filename argument it becomes something more than an identity pipe, though, and begins to serve a real purpose.Decurion
@kojiro, true, concatenation, but still some programs behave the same way (head, tail, grep). Maybe I should have said arguments, plural :)Marine
The formerly popular link to partmaps.org is unfortunately dead. The content is now at porkmail.org/era/unix/award.htmlCatchpole
Related: What is the general consensus on “Useless use of cat”?Haywoodhayyim
See also: unix.stackexchange.com/q/511827/20336Neither
I observe that if you want to show rightward dataflow (reason 1) you can do so by putting the file redirection before the command, as in <file command1 | command2, though there would be disagreement about the aesthetics.Epode
Doesn't use of cat in this case allow for left to right reading, rather than right to left. Which is typically more common in programming and therefore to be preferred (subjectively, of course).Mcmath
But then the argument for cat is to the right of the command, too. If you really prefer left to right, <file grep should be your preferred syntax.Catchpole
S
124

I was not aware of the award until today when some rookie tried to pin the UUOC on me for one of my answers. It was a cat file.txt | grep foo | cut ... | cut .... I gave him a piece of my mind, and only after doing so visited the link he gave me referring to the origins of the award and the practice of doing so. Further searching led me to this question. Somewhat unfortunately despite conscious consideration, none of the answers included my rationale.

I had not meant to be defensive in responding to him. After all, in my younger years, I would have written the command as grep foo file.txt | cut ... | cut ... because whenever you do the frequent single greps you learn the placement of the file argument and it is ready knowledge that the first is the pattern and the later ones are file names.

It was a conscious choice to use cat when I answered the question, partly because of a reason of "good taste" (in the words of Linus Torvalds) but chiefly for a compelling reason of function.

The latter reason is more important so I will put it out first. When I offer a pipeline as a solution I expect it to be reusable. It is quite likely that a pipeline would be added at the end of or spliced into another pipeline. In that case having a file argument to grep screws up reusability, and quite possibly do so silently without an error message if the file argument exists. I. e. grep foo xyz | grep bar xyz | wc will give you how many lines in xyz contain bar while you are expecting the number of lines that contain both foo and bar. Having to change arguments to a command in a pipeline before using it is prone to errors. Add to it the possibility of silent failures and it becomes a particularly insidious practice.

The former reason is not unimportant either since a lot of "good taste" merely is an intuitive subconscious rationale for things like the silent failures above that you cannot think of right at the moment when some person in need of education says "but isn't that cat useless".

However, I will try to also make conscious the former "good taste" reason I mentioned. That reason has to do with the orthogonal design spirit of Unix. grep does not cut and ls does not grep. Therefore at the very least grep foo file1 file2 file3 goes against the design spirit. The orthogonal way of doing it is cat file1 file2 file3 | grep foo. Now, grep foo file1 is merely a special case of grep foo file1 file2 file3, and if you do not treat it the same you are at least using up brain clock cycles trying to avoid the useless cat award.

That leads us to the argument that grep foo file1 file2 file3 is concatenating, and cat concatenates so it is proper to cat file1 file2 file3 but because cat is not concatenating in cat file1 | grep foo therefore we are violating the spirit of both the cat and the almighty Unix. Well, if that were the case then Unix would need a different command to read the output of one file and spit it to stdout (not paginate it or anything just a pure spit to stdout). So you would have the situation where you say cat file1 file2 or you say dog file1 and conscientiously remember to avoid cat file1 to avoid getting the award, while also avoiding dog file1 file2 since hopefully the design of dog would throw an error if multiple files are specified.

Hopefully, at this point, you sympathize with the Unix designers for not including a separate command to spit a file to stdout, while also naming cat for concatenate rather than giving it some other name. <edit> removed incorrect comments on <, in fact, < is an efficient no-copy facility to spit a file to stdout which you can position at the beginning of a pipeline so the Unix designers did include something specifically for this </edit>

The next question is why is it important to have commands that merely spit a file or the concatenation of several files to stdout, without any further processing? One reason is to avoid having every single Unix command that operates on standard input to know how to parse at least one command line file argument and use it as input if it exists. The second reason is to avoid users having to remember: (a) where the filename arguments go; and (b) avoid the silent pipeline bug as mentioned above.

That brings us to why grep does have the extra logic. The rationale is to allow user-fluency for commands that are used frequently and on a stand-alone basis (rather than as a pipeline). It is a slight compromise of orthogonality for a significant gain in usability. Not all commands should be designed this way and commands that are not frequently used should completely avoid the extra logic of file arguments (remember extra logic leads to unnecessary fragility (the possibility of a bug)). The exception is to allow file arguments like in the case of grep. (By the way, note that ls has a completely different reason to not just accept but pretty much require file arguments)

Finally, what could have been done better is if such exceptional commands as grep (but not necessarily ls) generate an error if the standard input is also available when file arguments are specified.

Sr answered 18/5, 2013 at 0:4 Comment(14)
Note that when grep is invoked with multiple file names, it prefixes the found lines with the name of the file it was found in (unless you turn that behaviour off). It can also report the line numbers in the individual files. If only use cat to feed grep, you lose the file names, and the line numbers are continuous over all files, not per file. Thus there are reasons for having grep handle multiple files itself that cat cannot handle. The single file and zero file cases are simply special cases of the general multi-file use of grep.Thankful
As noted in the answer by kojiro, it is perfectly possible and legal to start the pipeline with < file command1 .... Although the conventional position for the I/O redirection operators is after the command name and its arguments, that is only the convention and not a mandatory placement. The < does have to precede the file name. So, there's a close to perfect symmetry between >output and <input redirections: <input command1 -opt 1 | command2 -o | command3 >output.Thankful
good points. i have been uncharitable to < given its positionability and efficiency. and it does seem that commands that take multiple file arguments do so for reasons other than fluency. for example wc when given multiple arguments lists counts for each file. i suspect a future kernel will allow identity pipes in kernel space (link below), so cat single_file could eventually avoid copies. i still cannot find the cat useless by any stretch. if you are using grep with line numbers your pipeline is not appendable to any other pipeline. if it is vanilla grep, cat wins IMO.Sr
cs.vu.nl/~herbertb/papers/osreview2008-2.pdf - example of ongoing performance related workSr
I think one reason why people throw the UUoC stone (including me) is to primarily educate. Sometimes people do process gigabytes huge textfiles in which case minimizing pipes (UUoC, collapsing sequential greps into one, a.s.o.) is crucial and often it can be safely assumed based on the question that the OP really just doesn't know that small tweaks might have huge performance impacts. I fully agree with your point about brain cycles and that's why I find myself using cat regularly even when not needed. But it's important to know that it's not needed.Cuyler
Please understand; I am in no sense saying that cat is useless. It is not that cat is useless; it is that a particular construct does not need the use of cat. If you like, note that it is UUoC (Useless Use of cat), and not UoUC (Use of Useless cat). There are many occasions when cat is the correct tool to use; I have no problem with it being used when it is the correct tool to use (and, indeed, mention a case in my answer).Thankful
@JonathanLeffler neither did i imply you said that. our disagreement is indeed about the specific use of cat prior to a pipeline. it is probably an agree to disagree situation because the uselessness is SOLELY for performance reasons. i am not trivializing the performance implications. if it were java i would agree. but it is a scripting environment and it is unfair to disproportionately call out useless uses just for cat when the inefficiency equally applies to say: <file grep foo | grep bar and other constructs. i. e. "if i had a nickel for every optimizable pipeline..."Sr
there is the case for a unix-calculus on the lines of relational-calculus where you declaratively specify your pipeline whichever way on the lines of SQL and the script-engine optimizes the script expression and executes it mostly in kernel space. (analogy is not that much of a stretch if you consider the impact of sort in a pipeline). the Haskell lazy evaluation folks might also have some insights on how to best implement this in a less disk-oriented environment. i wonder how do they collapse their pipelines.Sr
@randomstring I hear you, but I think it really depends on the use case. When used on the command line one additional cat in the pipe might not be a big deal depending on the data, but when used as a programming environment it can be absolutely necessary to implement these performance critical things; especially when dealing with bash which, performance-wise, is like a rectangularly-shaped wheel (compared to ksh anyway. I am talking up to 10x slower here - no kidding). You do want to optimize your forks (and not just that) when dealing with larger scripts or huge loops.Cuyler
@AdrianFrühwirth mostly agree. as a c++/java developer i have only written unix pipelines for one-off or for performance-moot problems. sysadmins probably do the opposite. in the non-performance context there is a lego-block mapping of concepts in my head to the pipeline. forget cat consider the positioning of a grep. a very selective grep early in the pipeline boosts performance as does a permissive grep at the end. but the concept structure in your brain and that of a subsequent reader has a different optimal structure. when to sacrifice the cognitive map? when amdahl's law says so?Sr
Useless use of "cat"? If a cat command starting a pipe contributes to getting the job done, its hardly useless in my book. Same goes for "grep stuff file.txt | wc -l" because it also gets the job done (and I have one less grep flag to keep track of, saving cognitive cycles I can use for other things). Performance? Sure, performance might be less, but before one runs into situations where the difference matters, the motto "premature optimization is the root of all evil" have some merit. Processes ARE cheap (and are meant to be lego blocks). Brain cycles OTOH aren't.Drift
@AdrianFrühwirth "small tweaks might have huge performance impacts" You cannot simply write that without backing that up with numbers. My numbers are here: oletange.blogspot.com/2013/10/useless-use-of-cat.html They say that there is no benefit for low to medium throughput, but there is a benefit for high throughput.Setzer
@Sr "[...] commands should [...] alert the user if there is a possibility of a silent bug." I think that is hard to do: cat file | (grep foo $myfile; grep bar) I really do not want a warning from the first grep. This is very realistic if the (...) is a function.Setzer
@OleTange I never said that avoiding a single UUoC will have a huge performance impact. What I was saying (in context) was that that each pipe (fork) comes with a performance cost and something like cat | grep | cut | sort will always be "much" slower than e.g. a single execution of awk doing all the logic, sometimes magnitudes slower. In my book it's not about avoiding a single cat (which your numbers show doesn't make a huge difference, depending on throughput - thanks for that, btw) but writing smart (as well as fast and readable) code.Cuyler
D
73

Nope!

First of all, it doesn't matter where in a command the redirection happens. So if you like your redirection to the left of your command, that's fine:

< somefile command

is the same as

command < somefile

Second, there are n + 1 processes and a subshell happening when you use a pipe. It is most decidedly slower. In some cases n would've been zero (for example, when you're redirecting to a shell builtin), so by using cat you're adding a new process entirely unnecessarily.

As a generalization, whenever you find yourself using a pipe it's worth taking 30 seconds to see if you can eliminate it. (But probably not worth taking much longer than 30 seconds.) Here are some examples where pipes and processes are frequently used unnecessarily:

for word in $(cat somefile); … # for word in $(<somefile); … (or better yet, while read < somefile)

grep something | awk stuff; # awk '/something/ stuff' (similar for sed)

echo something | command; # command <<< something (although echo would be necessary for pure POSIX)

Feel free to edit to add more examples.

Decurion answered 29/7, 2012 at 15:49 Comment(7)
Well, the speed increase won't be much.Sublittoral
placing the "< somefile" before "command" technically gives you left to right, but it makes for ambiguous reading because there is no syntactic demarcation: < cat grep dog is a contrived example to show that you can't easily tell between the input file, the command that receives the input, and the arguments to the command.Sr
The rule of thumb I've adopted for deciding where the STDIN redirect goes is to do whatever minimizes the appearance of ambiguity/ potential for surprise. Dogmatically saying it goes before brings up necromancer's issue, but dogmatically saying it goes after can do the same thing. Consider: stdout=$(foo bar -exec baz <qux | ENV=VAR quux). Q. Does <qux apply to foo, or to baz, which is -exec'd by foo? A. It applies to foo, but can appear ambiguous. Putting <qux before foo in this case is more clear, albeit less common, and is analogous to the trailing ENV=VAR quux.Jessikajessup
@necromancer, <"cat" grep dog is easier to read, there. (I'm usually pro-whitespace, but this particular case is very much an exception).Dacoity
@CharlesDuffy yes, but this is a peripheral aspect. as mentioned in other comment, it is probably two different groups of people prioritizing two different things. That's a somewhat charitable take on it though, because Unix pipelines are rare in production, where database and network efficiency is the focus. Anyway, good to know all the nuances. I didn't know about the <"foo"' prefix syntax; always thought redirection went at end :-)Sr
@Decurion "It is most decidedly slower." You cannot write that without backing that up with numbers. My numbers are here: oletange.blogspot.com/2013/10/useless-use-of-cat.html (and they show it is only slower when you have high troughput) Where are yours?Setzer
"Dogmatically saying it goes before brings up necromancer's issue" that was pure gold, I'm terrified of saying it goes before now.Ataghan
I
55

In defense of cat:

Yes,

   < input process > output 

or

   process < input > output 

is more efficient, but many invocations don't have performance issues, so you don't care.

ergonomic reasons:

We are used to read from left to right, so a command like

    cat infile | process1 | process2 > outfile

is trivial to understand.

    process1 < infile | process2 > outfile

has to jump over process1, and then read left to right. This can be healed by:

    < infile process1 | process2 > outfile

looks somehow, as if there were an arrow pointing to the left, where nothing is. More confusing and looking like fancy quoting is:

    process1 > outfile < infile

and generating scripts is often an iterative process,

    cat file 
    cat file | process1
    cat file | process1 | process2 
    cat file | process1 | process2 > outfile

where you see your progress stepwise, while

    < file 

not even works. Simple ways are less error prone and ergonomic command catenation is simple with cat.

Another topic is, that most people were exposed to > and < as comparison operators, long before using a computer and when using a computer as programmers, are far more often exposed to these as such.

And comparing two operands with < and > is contra commutative, which means

(a > b) == (b < a)

I remember the first time using < for input redirection, I feared

a.sh < file 

could mean the same as

file > a.sh

and somehow overwrite my a.sh script. Maybe this is an issue for many beginners.

rare differences

wc -c journal.txt
15666 journal.txt
cat journal.txt | wc -c 
15666

The latter can be used in calculations directly.

factor $(cat journal.txt | wc -c)

Of course the < can be used here too, instead of a file parameter:

< journal.txt wc -c 
15666
wc -c < journal.txt
15666
    

but who cares - 15k?

If I would run occasionally into issues, surely I would change my habit of invocing cat.

When using very large or many, many files, avoiding cat is fine. To most questions the use of cat is orthogonal, off topic, not an issue.

Starting these useless useless use of cat discussion on every second shell topic is only annoying and boring. Get a life and wait for your minute of fame, when dealing with performance questions.

Inconvenience answered 13/3, 2018 at 10:46 Comment(2)
+11111 .. As the author of the currently accepted answer, I highly recommend this delightful complement. The specific examples elucidate my often abstract and wordy arguments, and the laugh you get from the author's early trepidation of file > a.sh is alone worth the time reading this :) Thanks for sharing!Sr
In this invocation cat file | wc -c, wc needs to read stdin until EOF, counting bytes. But in this, wc -c < file, it just stats stdin, finds out it's a regular file and print st_size instead of reading any input. For a large file the difference in performance would be clearly visible.Artis
E
32

I disagree with most instances of the excessively smug UUOC Award because, when teaching someone else, cat is a convenient place-holder for any command or crusty complicated pipeline of commands that produce output suitable for the problem or task being discussed.

This is especially true on sites like Stack Overflow, ServerFault, Unix & Linux or any of the SE sites.

If someone specifically asks about optimisation, or if you feel like adding extra information about it then, great, talk about how using cat is inefficient. But don't berate people because they chose to aim for simplicity and ease-of-understanding in their examples rather than look-at-me-how-cool-am-i! complexity.

In short, because cat isn't always cat.

Also because most people who enjoy going around awarding UUOCs do it because they're more concerned with showing off about how 'clever' they are than they are about helping or teaching people. In reality, they demonstrate that they're probably just another newbie who has found a tiny stick to beat their peers with.


Update

Here's another UUOC that I posted in an answer at https://unix.stackexchange.com/a/301194/7696:

sqlq() {
  local filter
  filter='cat'

  # very primitive, use getopts for real option handling.
  if [ "$1" == "--delete-blank-lines" ] ; then
    filter='grep -v "^$"'
    shift
  fi

  # each arg is piped into sqlplus as a separate command
  printf "%s\n" "$@" | sqlplus -S sss/eee@sid | $filter
}

UUOC pedants would say that that's a UUOC because it's easily possible to make $filter default to the empty string and have the if statement do filter='| grep -v "^$"' but IMO, by not embedding the pipe character in $filter, this "useless" cat serves the extremely useful purpose of self-documenting the fact that $filter on the printf line isn't just another argument to sqlplus, it's an optional user-selectable output filter.

If there's any need to have multiple optional output filters, the option processing could just append | whatever to $filter as often as needed - one extra cat in the pipeline isn't going to hurt anything or cause any noticeable loss of performance.

Exile answered 21/9, 2015 at 6:57 Comment(1)
As an aside -- == inside [ ] isn't specified by POSIX, and not all implementations accept it. The standardized operator is just =.Dacoity
T
31

With the UUoC version, cat has to read the file into memory, then write it out to the pipe, and the command has to read the data from the pipe, so the kernel has to copy the whole file three times whereas in the redirected case, the kernel only has to copy the file once. It is quicker to do something once than to do it three times.

Using:

cat "$@" | command

is a wholly different and not necessarily useless use of cat. It is still useless if the command is a standard filter that accepts zero or more filename arguments and processes them in turn. Consider the tr command: it is a pure filter that ignores or rejects filename arguments. To feed multiple files to it, you have to use cat as shown. (Of course, there's a separate discussion that the design of tr is not very good; there's no real reason it could not have been designed as a standard filter.) This might also be valid if you want the command to treat all the input as a single file rather than as multiple separate files, even if the command would accept multiple separate files: for example, wc is such a command.

It is the cat single-file case that is unconditionally useless.

Thankful answered 29/7, 2012 at 16:27 Comment(0)
C
19

As someone who regularly points out this and a number of other shell programming antipatterns, I feel obliged to, belatedly, weigh in.

Shell script is very much a copy/paste language. For most people who write shell scripts, they are not in it to learn the language; it's just an obstacle they have to overcome in order to continue to do things in the language(s) they are actually somewhat familiar with.

In that context, I see it as disruptive and potentially even destructive to propagate various shell scripting anti-patterns. The code that someone finds on Stack Overflow should ideally be possible to copy/paste into their environment with minimal changes, and incomplete understanding.

Among the many shell scripting resources on the net, Stack Overflow is unusual in that users can help shape the quality of the site by editing the questions and answers on the site. However, code edits can be problematic because it's easy to make changes which were not intended by the code author. Hence, we tend to leave comments to suggest changes to the code.

The UUCA and related antipattern comments are not just for the authors of the code we comment on; they are as much a caveat emptor to help readers of the site become aware of problems in the code they find here.

We can't hope to achieve a situation where no answers on Stack Overflow recommend useless cats (or unquoted variables, or chmod 777, or a large variety of other antipattern plagues), but we can at least help educate the user who is about to copy/paste this code into the innermost tight loop of their script which executes millions of times.

As far as technical reasons go, the traditional wisdom is that we should try to minimize the number of external processes; this continues to hold as a good general guidance when writing shell scripts.

Catchpole answered 11/12, 2017 at 5:28 Comment(3)
Also that for large files, piping through cat is a lot of extra context switches and memory bandwidth (and pollution of L3 cache from extra copies of data in cat's read buffer, and the pipe buffers). Especially on a big multi-core machine (like many hosting setups) cache / memory bandwidth is a shared resource.Arzola
@PeterCordes Please post your measurements. So we can is if it really matters in practice. My experience is that it normally does not matter: oletange.blogspot.com/2013/10/useless-use-of-cat.htmlSetzer
Your own blog shows a 50% slowdown for high-throughput, and you aren't even looking at the impact on total throughput (if you had stuff keeping the other cores busy). If I get around to it, I might run your tests while x264 or x265 are encoding a video using all cores, and see how much it slows down the video encoding. bzip2 and gzip compression are both very slow compared to the amount of overhead cat adds to that alone (with the machine otherwise idle). It's hard to read your tables (line wrap in the middle of a number?). sys time increases a lot, but still small vs. user or real?Arzola
T
18

An additional problem is that the pipe can silently mask a subshell. For this example, I'll replace cat with echo, but the same problem exists.

echo "foo" | while read line; do
    x=$line
done

echo "$x"

You might expect x to contain foo, but it doesn't. The x you set was in a subshell spawned to execute the while loop. x in the shell that started the pipeline has an unrelated value, or is not set at all.

In bash4, you can configure some shell options so that the last command of a pipeline executes in the same shell as the one that starts the pipeline, but then you might try this

echo "foo" | while read line; do
    x=$line
done | awk '...'

and x is once again local to the while's subshell.

Tipper answered 29/7, 2012 at 15:56 Comment(2)
In strictly POSIX shells this can be a tricky problem because you don't have here strings or process substitutions to avoid the pipe. BashFAQ 24 has some useful solutions even in that case.Decurion
In some shells, the illustrated pipe doesn't create a subshell. Examples include Korn and Z. They also support process substitution and here strings. Of course they're not strictly POSIX. Bash 4 has shopt -s lastpipe to avoid creating the subshell.Vincenzovincible
S
16

I often use cat file | myprogram in examples. Sometime I am being accused of Useless use of cat (http://porkmail.org/era/unix/award.html). I disagree for the following reasons:

  • It is easy to understand what is going on.

    When reading a UNIX command you expect a command followed by arguments followed by redirection. It is possible to put the redirection anywhere but it is rarely seen - thus people will have a harder time reading the example. I believe

    cat foo | program1 -o option -b option | program2
    

    is easier to read than

    program1 -o option -b option < foo | program2
    

    If you move the redirection to the start you are confusing people who are not used to this syntax:

    < foo program1 -o option -b option | program2
    

    and examples should be easy to understand.

  • It is easy to change.

    If you know the program can read from cat, you can normally assume it can read the output from any program that outputs to STDOUT, and thus you can adapt it for your own needs and get predictable results.

  • It stresses that the program does not fail, if STDIN is not a file.

    It is not safe to assume that if program1 < foo works then cat foo | program1 will also work. However, it is safe to assume the opposite. This program works if STDIN is a file, but fails if the input is a pipe, because it uses seek:

    # works
    < foo perl -e 'seek(STDIN,1,1) || die;print <STDIN>'
    
    # fails
    cat foo | perl -e 'seek(STDIN,1,1) || die;print <STDIN>'
    

Performance cost

There is a cost of doing the additional cat. To give an idea of how much I ran a few tests to simulate baseline (cat), low throughput (bzip2), medium throughput (gzip), and high throughput (grep).

cat $ISO | cat
< $ISO cat
cat $ISO | bzip2
< $ISO | bzip2
cat $ISO | gzip
< $ISO gzip
cat $ISO | grep no_such_string
< $ISO grep no_such_string

The tests were run on a low end system (0.6 GHz) and an ordinary laptop (2.2 GHz). They were run 10 times on each system and the best timing was chosen to mimic the optimal situation for each test. The $ISO was ubuntu-11.04-desktop-i386.iso. (Prettier tables here: http://oletange.blogspot.com/2013/10/useless-use-of-cat.html)

CPU                       0.6 GHz ARM
Command                   cat $ISO|                        <$ISO                            Diff                             Diff (pct)
Throughput \ Time (ms)    User       Sys        Real       User       Sys        Real       User       Sys        Real       User       Sys        Real
Baseline (cat)                     55      14453      33090         23       6937      33126         32       7516        -36        239        208         99
Low (bzip2)                   1945148      16094    1973754    1941727       5664    1959982       3420      10430      13772        100        284        100
Medium (gzip)                  413914      13383     431812     407016       5477     416760       6898       7906      15052        101        244        103
High (grep no_such_string)      80656      15133      99049      79180       4336      86885       1476      10797      12164        101        349        114

CPU                       Core i7 2.2 GHz
Command                   cat $ISO|           <$ISO             Diff          Diff (pct)
Throughput \ Time (ms)    User     Sys Real   User   Sys Real   User Sys Real User       Sys Real
Baseline (cat)                    0 356    215      1  84     88    0 272  127          0 423  244
Low (bzip2)                  136184 896 136765 136728 160 137131 -545 736 -366         99 560   99
Medium (gzip)                 26564 788  26791  26332 108  26492  232 680  298        100 729  101
High (grep no_such_string)      264 392    483    216  84    304   48 308  179        122 466  158

The results show that for low and medium throughput the cost is in the order of 1%. This is well within the uncertainty of the measurements, so in practice there is no difference.

For high throughput the difference is bigger and there is a clear difference between the two.

That leads to the conclusion: You should use < instead of cat | if:

  • the complexity of the processing is similar to a simple grep
  • performance matters more than readability.

Otherwise it does not matter whether you use < or cat |.

And thus you should only give a UUoC-award if and only if:

  • you can measure a significant difference in the performance (publish your measurements when you give the award)
  • performance matters more than readability.
Setzer answered 5/1, 2019 at 10:19 Comment(0)
E
-3

I think that (the traditional way) using pipe is a bit more faster; on my box I used strace command to see what's going on:

Without pipe:

toc@UnixServer:~$ strace wc -l < wrong_output.c
execve("/usr/bin/wc", ["wc", "-l"], [/* 18 vars */]) = 0
brk(0)                                  = 0x8b50000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb77ad000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=29107, ...}) = 0
mmap2(NULL, 29107, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb77a5000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/i386-linux-gnu/libc.so.6", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0p\222\1\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1552584, ...}) = 0
mmap2(NULL, 1563160, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7627000
mmap2(0xb779f000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x178) = 0xb779f000
mmap2(0xb77a2000, 10776, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb77a2000
close(3)                                = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7626000
set_thread_area({entry_number:-1 -> 6, base_addr:0xb76268d0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
mprotect(0xb779f000, 8192, PROT_READ)   = 0
mprotect(0x804f000, 4096, PROT_READ)    = 0
mprotect(0xb77ce000, 4096, PROT_READ)   = 0
munmap(0xb77a5000, 29107)               = 0
brk(0)                                  = 0x8b50000
brk(0x8b71000)                          = 0x8b71000
open("/usr/lib/locale/locale-archive", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=5540198, ...}) = 0
mmap2(NULL, 2097152, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7426000
mmap2(NULL, 1507328, PROT_READ, MAP_PRIVATE, 3, 0x2a8) = 0xb72b6000
close(3)                                = 0
open("/usr/share/locale/locale.alias", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=2570, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb77ac000
read(3, "# Locale name alias data base.\n#"..., 4096) = 2570
read(3, "", 4096)                       = 0
close(3)                                = 0
munmap(0xb77ac000, 4096)                = 0
open("/usr/share/locale/fr_FR.UTF-8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/fr_FR.utf8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/fr_FR/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/fr.UTF-8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/fr.utf8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/fr/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale-langpack/fr_FR.UTF-8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale-langpack/fr_FR.utf8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale-langpack/fr_FR/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale-langpack/fr.UTF-8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale-langpack/fr.utf8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale-langpack/fr/LC_MESSAGES/coreutils.mo", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=316721, ...}) = 0
mmap2(NULL, 316721, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7268000
close(3)                                = 0
open("/usr/lib/i386-linux-gnu/gconv/gconv-modules.cache", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=26064, ...}) = 0
mmap2(NULL, 26064, PROT_READ, MAP_SHARED, 3, 0) = 0xb7261000
close(3)                                = 0
read(0, "#include<stdio.h>\n\nint main(int "..., 16384) = 180
read(0, "", 16384)                      = 0
fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2), ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7260000
write(1, "13\n", 313
)                     = 3
close(0)                                = 0
close(1)                                = 0
munmap(0xb7260000, 4096)                = 0
close(2)                                = 0
exit_group(0)                           = ?

And with pipe:

toc@UnixServer:~$ strace cat wrong_output.c | wc -l
execve("/bin/cat", ["cat", "wrong_output.c"], [/* 18 vars */]) = 0
brk(0)                                  = 0xa017000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb774b000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=29107, ...}) = 0
mmap2(NULL, 29107, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7743000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/i386-linux-gnu/libc.so.6", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0p\222\1\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1552584, ...}) = 0
mmap2(NULL, 1563160, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb75c5000
mmap2(0xb773d000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x178) = 0xb773d000
mmap2(0xb7740000, 10776, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7740000
close(3)                                = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb75c4000
set_thread_area({entry_number:-1 -> 6, base_addr:0xb75c48d0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
mprotect(0xb773d000, 8192, PROT_READ)   = 0
mprotect(0x8051000, 4096, PROT_READ)    = 0
mprotect(0xb776c000, 4096, PROT_READ)   = 0
munmap(0xb7743000, 29107)               = 0
brk(0)                                  = 0xa017000
brk(0xa038000)                          = 0xa038000
open("/usr/lib/locale/locale-archive", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=5540198, ...}) = 0
mmap2(NULL, 2097152, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb73c4000
mmap2(NULL, 1507328, PROT_READ, MAP_PRIVATE, 3, 0x2a8) = 0xb7254000
close(3)                                = 0
fstat64(1, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
open("wrong_output.c", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0664, st_size=180, ...}) = 0
read(3, "#include<stdio.h>\n\nint main(int "..., 32768) = 180
write(1, "#include<stdio.h>\n\nint main(int "..., 180) = 180
read(3, "", 32768)                      = 0
close(3)                                = 0
close(1)                                = 0
close(2)                                = 0
exit_group(0)                           = ?
13

You can do some testing with strace and time command with more and longer commands for good benchmarking.

Eyeleteer answered 29/7, 2012 at 18:38 Comment(7)
I don't understand what you mean by (the traditional way) using pipe, or why you think this strace shows that it's faster – the strace isn't tracing the wc -l execution in the second case. It only traces the first command of the pipeline here.Decurion
@Decurion : i mean by traditional way = the most used way (i think that we use pipe more than indirection), I can't confirm that it's faster or not, in my trace i saw more system calls for indirection. You can use a c program and a loop to see with one consume more time. If you're interested we can put it here :)Eyeleteer
As this is currently given, the strace output doesn't actually show the syscalls associated with the mkfifo() overhead or the redirection overhead. As such, it's utterly irrelevant to determining actual performance delta; all you're comparing is the difference between strace wc -l and strace cat -- the pipeline-vs-redirection difference has no impact whatsoever on any of the content pasted in this answer.Dacoity
An apples-to-apples comparison would put strace -f sh -c 'wc -l < wrong_output.c' alongside strace -f sh -c 'cat wrong_output.c | wc -l'.Dacoity
A simple test along those lines is vaguely inconclusive -- more wall clock time without a pipeline but less user+sys time so more efficient from a system perspective. (I took out the strace because that is not what we want to measure.) My test ran wc -l on a 13,000 line file 1000 times in a Bash for n in {1..1000} loop.Catchpole
Here are results from ideone.com, which currently are clearly in favor of without cat: ideone.com/2w1W42#stderrCatchpole
@CharlesDuffy: mkfifo creates a named pipe. An anonymous pipe is set up with pipe(2) and then forking, and having the parent and child close different ends of the pipe. But yes, this answer is total nonsense, and didn't even try to count the system calls or use strace -O to measure overhead, or -r to timestamp each call relative to the last...Arzola

© 2022 - 2024 — McMap. All rights reserved.