Previous Section  < Day Day Up >  Next Section

4.2. Shell Variables

bash derives much of its programming functionality from shell variables. We've already seen the basics of variables. To recap briefly: they are named places to store data, usually in the form of character strings, and their values can be obtained by preceding their names with dollar signs ($). Certain variables, called environment variables, are conventionally named in all capital letters, and their values are made known (with the export statement) to subprocesses.

If you are a programmer, you already know that just about every major programming language uses variables in some way; in fact, an important way of characterizing differences between languages is comparing their facilities for variables.

The chief difference between bash's variable schema and those of conventional languages is that bash's places heavy emphasis on character strings. (Thus it has more in common with a special-purpose language like SNOBOL than a general-purpose one like Pascal.) This is also true of the Bourne shell and the C shell, but bash goes beyond them by having additional mechanisms for handling integers explicitly.

4.2.1. Positional Parameters

As we have already seen, you can define values for variables with statements of the form varname=value, e.g.:

$ hatter=mad

$ echo "$hatter"

mad

The shell predefines some environment variables when you log in. There are other built-in variables that are vital to shell programming. We will look at a few of them now and save the others for later.

The most important special, built-in variables are called positional parameters. These hold the command-line arguments to scripts when they are invoked. Positional parameters have the names 1, 2, 3, etc., meaning that their values are denoted by $1, $2, $3, etc. There is also a positional parameter 0, whose value is the name of the script (i.e., the command typed in to invoke it).

Two special variables contain all of the positional parameters (except positional parameter 0): * and @. The difference between them is subtle but important, and it's apparent only when they are within double quotes.

"$*" is a single string that consists of all of the positional parameters, separated by the first character in the value of the environment variable IFS (internal field separator), which is a space, TAB, and NEWLINE by default. On the other hand, "$@" is equal to "$1" "$2"... "$N", where N is the number of positional parameters. That is, it's equal to N separate double-quoted strings, which are separated by spaces. If there are no positional parameters, "$@" expands to nothing. We'll explore the ramifications of this difference in a little while.

The variable # holds the number of positional parameters (as a character string). All of these variables are "read-only," meaning that you can't assign new values to them within scripts.

For example, assume that you have the following simple shell script:

echo "alice: $@"

echo "$0: $1 $2 $3 $4"

echo "$# arguments"

Assume further that the script is called alice. Then if you type alice in wonderland, you will see the following output:

alice: in wonderland

alice: in wonderland

2 arguments

In this case, $3 and $4 are unset, which means that the shell will substitute the empty (or null) string for them.[3]

[3] Unless the option nounset is turned on, in which case the shell will return an error message.

4.2.1.1 Positional parameters in functions

Shell functions use positional parameters and special variables like * and # in exactly the same way as shell scripts do. If you wanted to define alice as a function, you could put the following in your .bash_profile or environment file:

function alice

{

    echo "alice: $*"

    echo "$0: $1 $2 $3 $4"

    echo "$# arguments"

}

You will get the same result if you type alice in wonderland.

Typically, several shell functions are defined within a single shell script. Therefore each function will need to handle its own arguments, which in turn means that each function needs to keep track of positional parameters separately. Sure enough, each function has its own copies of these variables (even though functions don't run in their own subshells, as scripts do); we say that such variables are local to the function.

However, other variables defined within functions are not local (they are global), meaning that their values are known throughout the entire shell script. For example, assume that you have a shell script called ascript that contains this:

function afunc

{

  echo in function: $0 $1 $2

  var1="in function"

  echo var1: $var1

}

     

var1="outside function"

echo var1: $var1

echo $0: $1 $2

afunc funcarg1 funcarg2

echo var1: $var1

echo $0: $1 $2

If you invoke this script by typing ascript arg1 arg2, you will see this output:

var1: outside function

ascript: arg1 arg2

in function: ascript funcarg1 funcarg2

var1: in function

var1: in function

ascript: arg1 arg2

In other words, the function afunc changes the value of the variable var1 from "outside function" to "in function," and that change is known outside the function, while $1 and $2 have different values in the function and the main script. Notice that $0 doesn't change because the function executes in the environment of the shell script and $0 takes the name of the script. Figure 4-2 shows the scope of each variable graphically.

Figure 4-2. Functions have their own positional parameters


4.2.2. Local Variables in Functions

A local statement inside a function definition makes the variables involved all become local to that function. The ability to define variables that are local to "subprogram" units (procedures, functions, subroutines, etc.) is necessary for writing large programs, because it helps keep subprograms independent of the main program and of each other.

Here is the function from our last example with the variable var1 made local:

function afunc

{

  local var1

  echo in function: $0 $1 $2

     

  var1="in function"

  echo var1: $var1

}

Now the result of running ascript arg1 arg2 is:

var1: outside function

ascript: arg1 arg2

in function: ascript funcarg1 funcarg2

var1: in function

var1: outside function

ascript: arg1 arg2

Figure 4-3 shows the scope of each variable in our new script. Note that afunc now has its own, local copy of var1, although the original var1 would still be used by any other functions that ascript invokes.

Figure 4-3. Functions can have local variables


4.2.3. Quoting with $@ and $*

Now that we have this background, let's take a closer look at "$@" and "$*". These variables are two of the shell's greatest idiosyncracies, so we'll discuss some of the most common sources of confusion.

  • Why are the elements of "$*" separated by the first character of IFS instead of just spaces? To give you output flexibility. As a simple example, let's say you want to print a list of positional parameters separated by commas. This script would do it:

    IFS=,
    
    echo "$*"

  • Changing IFS in a script is risky, but it's probably OK as long as nothing else in the script depends on it. If this script were called arglist, then the command arglist alice dormouse hatter would produce the output alice,dormouse,hatter. Chapter 5 and Chapter 10 contain other examples of changing IFS.

  • Why does "$@" act like N separate double-quoted strings? To allow you to use them again as separate values. For example, say you want to call a function within your script with the same list of positional parameters, like this:

    function countargs
    
    {
    
        echo "$# args."
    
    }

  • Assume your script is called with the same arguments as arglist above. Then if it contains the command countargs "$*", the function will print 1 args. But if the command is countargs "$@", the function will print 3 args.

4.2.4. More on Variable Syntax

Before we show the many things you can do with shell variables, we have to point out a simplification we have been making: the syntax of $varname for taking the value of a variable is actually the simple form of the more general syntax, ${varname}.

Why two syntaxes? For one thing, the more general syntax is necessary if your code refers to more than nine positional parameters: you must use ${10} for the tenth instead of $10. Aside from that, consider the following case where you would like to place an underscore after your user ID:

echo $UID_

The shell will try to use UID_ as the name of the variable. Unless, by chance, $UID_ already exists, this won't print anything (the value being null or the empty string, ""). To obtain the desired result, you need to enclose the shell variable in curly brackets:

echo ${UID}_

It is safe to omit the curly brackets ({}) if the variable name is followed by a character that isn't a letter, digit, or underscore.

    Previous Section  < Day Day Up >  Next Section