5.2. for

The most obvious enhancement to make the previous script is the ability to report on multiple files instead of just one. Tests like -e and -d take only single arguments, so we need a way of calling the code once for each file given on the command line.

The way to do this—indeed, the way to do many things with bash—is with a looping construct. The simplest and most widely applicable of the shell's looping constructs is the for loop. We'll use for to enhance fileinfo soon.

The for loop allows you to repeat a section of code a fixed number of times. During each time through the code (known as an iteration), a special variable called a loop variable is set to a different value; this way each iteration can do something slightly different.

The for loop is somewhat, but not entirely, similar to its counterparts in conventional languages like C and Pascal. The chief difference is that the shell's standard for loop doesn't let you specify a number of times to iterate or a range of values over which to iterate; instead, it only lets you give a fixed list of values. In other words, you can't do anything like this Pascal-type code, which executes statements 10 times:

However, the for loop is ideal for working with arguments on the command line and with sets of files (e.g., all files in a given directory). We'll look at an example of each of these. But first, we'll show the syntax for the for construct:

The list is a list of names. (If in list is omitted, the list defaults to "$@", i.e., the quoted list of command-line arguments, but we'll always supply the in list for the sake of clarity.) In our solutions to the following task, we'll show two simple ways to specify lists.

Task 5-2

Task 4-4 used pattern matching and substitution to list the directories in PATH, one to a line. Unfortunately, old versions of bash don't have that particular pattern operator. Write a general shell script, listpath, that prints each directory in PATH, one per line. In addition, have it print out information about each directory, such as the permissions and the modification times.

This sets the IFS to be a colon, which is the separator used in PATH. The for loop loops through, setting dir to each of the colon delimited fields in PATH. ls is used to print out the directory name and associated information. The -l parameter specifies the "long" format and the -d tells ls to show only the directory itself and not its contents.

In using this you might see an error generated by ls saying, for example, ls: /usr/TeX/bin: No such file or directory. It indicates that a directory in PATH doesn't exist. We can modify the listpath script to check the PATH variable for nonexistent directories by adding some of the tests we saw earlier:

This time, as the script loops, we first check to see if the length of $dir is zero (caused by having a value of :: in the PATH). If it is, we set it to the current directory, then check to see if the directory doesn't exist. If it doesn't, we print out an appropriate message. Otherwise, we check to see if the file is not a directory. If it isn't, we say so.

The foregoing illustrated a simple use of for, but it's much more common to use for to iterate through a list of command-line arguments. To show this, we can enhance the fileinfo script above to accept multiple arguments. First, we write a bit of "wrapper" code that does the iteration:

The complete script consists of the for loop code and the above function, in either order; good programming style dictates that the function definition should go first.

The fileinfo script works as follows: in the for statement, "$@" is a list of all positional parameters. For each argument, the body of the loop is run with filename set to that argument. In other words, the function finfo is called once for each value of $filename as its first argument ($1). The call to echo after the call to finfo merely prints a blank line between sets of information about each file.

Given a directory with the same files as the earlier example, typing fileinfo* would produce the following output:

Task 5-3

It is possible to print out all of the directories below a given one by using the -R option of ls. Unfortunately, this doesn't give much idea about the directory structure because it prints all the files and directories line by line. Write a script that performs a recursive directory listing and produces output that gives an idea of the structure for a small number of subdirectories.

Each column represents a directory level. Entries below and to the right of an entry are files and directories under that directory. Files are just listed with no entries to their right. This example shows that the directory adventure and the file lewis.carroll are in the current directory; the directories aaiw and ttlg, and the file biog are under adventure, etc. To make life simple, we'll use TABs to line the columns up and ignore any "bleed over" of filenames from one column into an adjacent one.

We need to be able to traverse the directory hierarchy. To do this easily we'll use a programming technique known as recursion. Recursion is simply referencing something from itself; in our case, calling a piece of code from itself. For example, consider this script, tracedir, in your home directory:

First we copy and print the first argument. Then we test to see if it is a directory. If it is, we cd to it and call the script again with an argument of the files in that directory. This script is recursive; when the first argument is a directory, a new shell is invoked and a new script is run on the new directory. The old script waits until the new script returns, then the old script executes a cd back up one level and exits. This happens in each invocation of the tracedir script. The recursion will stop only when the first argument isn't a directory.

Running this on the directory structure listed above with the argument adventure will produce:

This script has a few problems, but it is the basis for the solution to this task. One major problem with the script is that it is very inefficient. Each time the script is called, a new shell is created. We can improve on this by making the script into a function, because (as you probably remember from Chapter 4) functions are part of the shell they are started from. We also need a way to set up the TAB spacing. The easiest way is to have an initializing script or function and call the recursive routine from that. Let's look at this routine.

First, we set up a variable to hold the TAB character for the echo command (Chapter 7 explains all of the options and formatting commands you can use with echo). Then we loop through each argument supplied to the function and print it out. If it is a directory, we call our recursive routine, supplying the list of files with ls. We have introduced a new command at this point: command. command is a shell built-in that disables function and alias look-up. In this case, it is used to make sure that the ls command is one from your command search path, PATH, and not a function (for further information on command see Chapter 7). After it's all over, we clean up by unsetting the variables we have used.

Each time it is called, recdir loops through the files it is given as arguments. For each one it prints the filename and then, if the file is a directory, calls itself with arguments set to the contents of the directory. There are two details that have to be taken care of: the number of TABs to use, and the pathname of the "current" directory in the recursion.

Each time we go down a level in the directory hierarchy we want to add a TAB character, so we append a TAB to the variable tab every time we enter recdir. Likewise, when we exit recdir we are moving up a directory level, so we remove the TAB when we leave the function. Initially, tab is not set, so the first time recdir is called, tab will be set to one TAB. If we recurse into a lower directory, recdir will be called again and another TAB will be appended. Remember that tab is a global variable, so it will grow and shrink in TABs for every entry and exit of recdir. The -e option to echo tells it to recognize escaped formatting characters, in our case the TAB character, \t.

In this version of the recursive routine we haven't used cd to move between directories. That means that an ls of a directory will have to be supplied with a relative path to files further down in the hierarchy. To do this, we need to keep track of the directory we are currently examining. The initialization routine sets the variable thisfile to the directory name each time a directory is found while looping. This variable is then used in the recursive routine to keep the relative pathname of the current file being examined. On each iteration of the loop, thisfile has the current filename appended to it, and at the end of the loop the filename is removed.

You might like to think of ways to modify the behavior and improve the output of this code. Here are some programming challenges:

In the current version, there is no way to determine if biog is a file or a directory. An empty directory looks no different to a file in the listing. Change the output so it appends a / to each directory name when it displays it.
Modify the code so that it only recurses down a maximum of eight subdirectories (which is about the maximum before the lines overflow the right-hand side of the screen). Hint: think about how TABs have been implemented.

Change the output so it includes dashed lines and adds a blank line after each directory, thus:

.

|

|-------adventure

|       |

|       |-------aaiw

|       |       |

|       |       |-------dodo

|       |       |-------duchess

|       |       |-------hatter

|       |       |-------march_hare

|       |       |-------queen

|       |       |-------tarts

|       |

|       |-------biog

...

Hint: you need at least two other variables that contain the characters "|" and "-".

At the start of this section we pointed out that the for loop in its standard form wasn't capable of iterating over a specified range of values as can be done in most programming languages. bash 2.0 introduced a new style of for loop which caters for this task; the arithmetic for loop. Well come back to it in the next chapter when we look at arithmetic operations.