Many people use oneliners and scripts containing code along the lines
cat "$MYFILE"| command1 | command2 >"$OUTPUT"
The first cat is often called "useless use of cat" because technically it requires starting a new process (often /usr/bin/cat) where this could be avoided if the command had been
<"$MYFILE" command1 | command2 >"$OUTPUT"
because then shell only needs to start command1 and simply point its stdin to the given file.
Why doesn't the shell do this conversion automatically? I feel that the "useless use of cat" syntax is easier to read and shell should have enough information to get rid of useless cat automatically. The cat is defined in POSIX standard so shell should be allowed to implement it internally instead of using a binary in path. The shell could even contain implementation only for exactly one argument version and fallback to binary in path.
( 6 months ago )
"Useless use of cat" is more about how you write your code than about what actually runs when you execute the script. It's a sort of design anti-pattern, a way of going about something that could probably be done in a more efficient manner. It's a failure in understanding of how to best combine the given tools to create a new tool. I'd argue that stringing several sed and/or awk commands together in a pipeline also could be said to be a symptom of this same anti-pattern.
Fixing instances of "useless use of cat" in a script is a primarily matter of fixing the source code of the script manually. A tool such as ShellCheck can help with this by pointing them out:
$ cat script.sh
cat file | cat
$ shellcheck script.sh
In script.sh line 2:
cat file | cat
^-- SC2002:Useless cat.Consider'cmd < file | ..' or 'cmd file | ..' instead.
Getting the shell to do this automatically would be difficult due to the nature of shell scripts. The way a script executes depends on the environment inherited from its parent process, and on the specific implementation of the available external commands.
The shell does not necessarily know what cat is. It could potentially be any command from anywhere in your $PATH, or a function.
If it was a built-in command (which it may be in some shells), it would have the ability to reorganise the pipeline as it would know of the semantics of its built-in cat command. Before doing that, it would additionally have to make assumptions about the next command in the pipeline, after the original cat.
Note that reading from standard input behaves slightly differently when it's connected to a pipe and when it's connected to a file. A pipe is not seekable, so depending on what the next command in the pipeline does, it may or may not behave differently if the pipeline was rearranged (it may detect whether the input is seekable and decide to do things differently if it is or if it isn't, in any case it would then behave differently).