Advanced usage¶
So far, we have only covered the simplest aspects of what the shell can do. But it can be used for far more than this, and can even be used as a full-blown scripting language capable of running complex applications. While this level of mastery is probably unnecessary for most users, there are a few advanced topics that are very useful and worth covering in more detail, even in an introductory document such as this. The interested reader is referred to more complete guides, such as this one.
Redirection¶
The standard output of commands, normally intended to be displayed on the
terminal, can be redirected to a file if needed, using the >
symbol.
For example, assuming that:
$ ls
docs html LICENSE README.md
The file listing can be redictered to a file, called listing.txt
in the
example below:
$ ls > listing.txt
This creates the file specified, and the output normally shown by ls
is not
visible on the terminal. It has however been stored in the listing.txt
file, as we can verify with cat
:
$ cat listing.txt
docs
html
LICENSE
README.md
This can also work in append mode, where the output of the command is appended to the file, rather than overwriting its entire contents.
For example:
$ app1 input output -options > log.txt
$ app2 arg1 arg2 >> log.txt
will create the log.txt
file in the first line, and record any output from
the app1
command. The second line will then append its output to the log
file.
Likewise, we can redirect the standard input to feed in the contents of a
file as input, rather than typing it in, using the <
symbol.
For example:
$ sort < myfile.txt
will feed the contents of myfile.txt
to the sort
command’s standard
input.
Pipes¶
This is a special type of redirection, where the standard output of one
command can be fed directly into the standard input of another, using the
|
symbol. Both commands run concurrently, with the second command able to
process the output of the first as soon as it is provided. This can be
incredibly useful to build compound commands.
For example:
$ grep ERROR log.txt | sort | uniq
ERROR: error type one
ERROR: input file not found
ERROR: something bad happened
uses the grep
command to find all lines in log.txt
that contain the
character string ERROR
, then feeds those lines (which would normally be
displayed on the terminal) via the pipe as input for the sort
command. This
sorts the lines in alphabetical order, and feed its output to the uniq
command, which remove duplicates. The outcome of the full pipeline is a list of
all unique error messages logged in the log.txt
file.
Another particularly useful example is to capture the output from a command
expected to produce a lot of output, and browse through it at a more suitable
pace rather than seeing it fly past on the terminal. This can be done using the
command less
(a paginator):
$ complex_process -verbose | less
This ability to quickly implememt otherwise non-trivial functionality is one of
the great strengths of the command-line. Unix is full of little tools like
grep
, sort
and uniq
that are designed to operate on text and to be
daisy-chained in this manner.
Conditional execution¶
While BASH provide its own if
statement for more complex situations, it
also offers a simple construct to allow execution of one command based on the
success or failure of another, using the &&
and ||
operators
respectively.
For example:
$ myapp args -options || echo "myapp failed to run!" >> log.txt
will record the fact that the myapp
command has failed to the log.txt
file.
On the other hand:
$ stage1 -options inputdata/ tmpdata/ && stage2 tmpdata/ outputdata/
will only run the stage2
command if the stage1
executable has completed
successfully (useful if the data produced by the first command is to be
processed by the next one).
Variables¶
It is often useful to store information in variables. For instance, you might
want to use a long and complicated filename often, and rather than typing it in
every time you need it, you could use a variable. Variables are assigned using
the =
symbol (beware: no spaces around it), and retrieved (dereferenced)
using the $
symbol.
For example:
$ logfile=/some/complicated/location/myapp/logs/run1.txt
$ myapp input intermediate > $logfile
$ otherapp intermediate output >> $logfile
...
The variable logfile
is set to the filename of the logfile, and the output
of all subsequent commands is then redirected to that file (see above).
Iterating with for loops¶
It is often required to perform the same command for a number of files. This
can be achieved simply and effectively with a for
loop, like this:
$ for item in logs/run*.txt; do grep OUTPUT $item; done
This will find all lines that contain the token OUTPUT
in the logfiles
stored in the logs/
folder that match the filename run*.txt
, and print
them on the terminal.
What actually happens here is that a variable item
is used to store each
token listed after the in
keywords (until the end of line or ;
symbol),
and the command(s) between the do
and done
keywords are then executed
for each token. The current value of the token can then be retrieved within the
loop by dereferencing it like any other variable, using the $
symbol.
Note that the above does not need to be all on the same line. In practice, lines
can be broken wherever the ;
was used in the example above:
$ for item in logs/run*.txt
> do
> grep OUTPUT $item
> done
Parameter substitution¶
There are certain operations that can be performed on variables at the point
where they are being dereferenced. Of these, the most useful are probably the
ability to strip a suffix or prefix. This is done using a syntax like
${var#prefix}
or ${var%suffix}
. This is most useful in scripts and when
combined with for
loops.
For example:
$ for data in *.dat
> do
> process $data ${data%.dat}.out > ${data%.dat}.log
> done
will run the process
command on all files in the current folder that end
with the .dat
suffix, and pass as second argument the same filename with
the .dat
suffix stripped and replaced with the .out
suffix. The output
of each command will individually be stored in log files, each with the
.log
suffix. If the current folder contained the files:
$ ls
backup/ final.dat original.dat parameters.txt trial2.dat
Then the commands actually run will be:
$ process final.dat final.out > final.log
$ process original.dat original.out > original.log
$ process trial2.dat trial2.out > trial2.log
There are many other types of parameter substitutions possible, see the relevant documentation for details.