Previous Section  < Day Day Up >  Next Section

6.4. Arrays

The pushd and popd functions use a string variable to hold a list of directories and manipulate the list with the string pattern-matching operators. Although this is quite efficient for adding or retrieving items at the beginning or end of the string, it becomes cumbersome when attempting to access items that are anywhere else, e.g., obtaining item N with the getNdirs function. It would be nice to be able to specify the number, or index, of the item and retrieve it. Arrays allow us to do this.[15]

[15] Support for arrays is not available in versions of bash prior to 2.0.

An array is like a series of slots that hold values. Each slot is known as an element, and each element can be accessed via a numerical index. An array element can contain a string or a number, and you can use it just like any other variable. The indices for arrays start at 0 and continue up to a very large number.[16] So, for example, the fifth element of array names would be names[4]. Indices can be any valid arithmetic expression that evaluates to a number greater than or equal to 0.

[16] Actually, up to 599147937791. That's almost six hundred billion, so yes, it's pretty large.

There are several ways to assign values to arrays. The most straightforward way is with an assignment, just like any other variable:

names[2]=alice

names[0]=hatter

names[1]=duchess

This assigns hatter to element 0, duchess to element 1, and alice to element 2 of the array names.

Another way to assign values is with a compound assignment:

names=([2]=alice [0]=hatter [1]=duchess)

This is equivalent to the first example and is convenient for initializing an array with a set of values. Notice that we didn't have to specify the indices in numerical order. In fact, we don't even have to supply the indices if we reorder our values slightly:

names=(hatter duchess alice)

bash automatically assigns the values to consecutive elements starting at 0. If we provide an index at some point in the compound assignment, the values get assigned consecutively from that point on, so:

names=(hatter [5]=duchess alice)

assigns hatter to element 0, duchess to element 5, and alice to element 6.

An array is created automatically by any assignment of these forms. To explicitly create an empty array, you can use the -a option to declare. Any attributes that you set for the array with declare (e.g., the read-only attribute) apply to the entire array. For example, the statement declare -ar names would create a read-only array called names. Every element of the array would be read-only.

An element in an array may be referenced with the syntax ${array[i]}. So, from our last example above, the statement echo ${names[5]} would print the string "duchess". If no index is supplied, array element 0 is assumed.

You can also use the special indices @ and *. These return all of the values in the array and work in the same way as for the positional parameters; when the array reference is within double quotes, using * expands the reference to one word consisting of all the values in the array separated by the first character of the IFS variable, while @ expands the values in the array to separate words. When unquoted, both of them expand the values of the array to separate words. Just as with positional parameters, this is useful for iterating through the values with a for loop:

for i in "${names[@]}"; do

    echo $i

done

Any array elements which are unassigned don't exist; they default to null strings if you explicitly reference them. Therefore, the previous looping example will print out only the assigned elements in the array names. If there were three values at indexes 1, 45, and 1005, only those three values would be printed.

If you want to know what indices currently have values in an array then you can use ${!array[@]}. In the last example this would return 1 45 1005.[17]

[17] This is not available in versions of bash prior to 3.0.

A useful operator that you can use with arrays is #, the length operator that we saw in Chapter 4. To find out the length of any element in the array, you can use ${#array[i]}. Similarly, to find out how many values there are in the array, use * or @ as the index. So, for names=(hatter [5]=duchess alice), ${#names[5]} has the value 7, and ${#names[@]} has the value 3.

Reassigning to an existing array with a compound array statement replaces the old array with the new one. All of the old values are lost, even if they were at different indices to the new elements. For example, if we reassigned names to be ([100]=tweedledee tweedledum), the values hatter, duchess, and alice would disappear.

You can destroy any element or the entire array by using the unset built-in. If you specify an index, that particular element will be unset. unset names[100], for instance, would remove the value at index 100; tweedledee in the example above. However, unlike assignment, if you don't specify an index the entire array is unset, not just element 0. You can explicitly specify unsetting the entire array by using * or @ as the index.

Let's now look at a simple example that uses arrays to match user IDs to account names on the system. The code takes a user ID as an argument and prints the name of the account plus the number of accounts currently on the system:

for i in $(cut -f 1,3 -d: /etc/passwd) ; do

   array[${i#*:}]=${i%:*}

done

     

echo "User ID $1 is ${array[$1]}."

echo "There are currently ${#array[@]} user accounts on the system."

We use cut to create a list from fields 1 and 3 in the /etc/passwd file. Field 1 is the account name and field 3 is the user ID for the account. The script loops through this list using the user ID as an index for each array element and assigns each account name to that element. The script then uses the supplied argument as an index into the array, prints out the value at that index, and prints the number of existing array values.

We'll now look at combining our knowledge of arrays with arithmetic for loops in the next task:

Task 6-3

Write a selection sort script that takes numbers in an array and sorts them.


Selection sort is a common algorithm for quickly sorting a set of elements. While it isn't the quickest sorting algorithm available, it is easy to understand and implement.

It works by selecting the smallest element in the set and moving it to the head of the set. It then repeats the process for the remainder of the set until the end of the set is reached.

For example, to sort the set 21543 it would start at 2 and then move down the set. 1 is less than 2 (and the other elements) so 1 is moved to the start: 12543. Then looking at 2 and moving down the list it finds nothing less than 2 so it moves to the next element, 5. Moving down the list 4 is less than 5, but 3 is less than 4, so 3 is moved: 12354. The next element is 5, and 4 is less than this so 4 is moved: 12345. Five is the last element so the sort is finished.

The code for this is as follows:

values=(39 5 36 12 9 3 2 30 4 18 22 1 28 25)

numvalues=${#values[@]}



for (( i=0; i < numvalues; i++ )); do

  lowest=$i



  for (( j=i; j < numvalues; j++ )); do

    if [ ${values[j]} -le ${values[$lowest]}; then

      lowest=$j

    fi

  done



  temp=${values[i]}

  values[i]=${values[lowest]}

  values[lowest]=$temp

done



for (( i=0; i < numvalues; i++ )); do

  echo -ne "${values[$i]}\t"

done



echo

At the start of the script we set up an array of randomly ordered values and a variable to hold the number of array elements as a convenience.

The outer i for loop is for looping over the entire array and pointing to the current "head" (where we put any value we need to swap). The variable lowest is set to this index.

The inner j loop is for looping over the remainder of the array. It compares the remaining elements with the value at lowest; if a value is less then lowest is set to the index of that element.

Once the inner loop is finished the values of the "head" (i) element and lowest are swapped by using a temporary variable temp.

On completing the outer loop, the script prints out the sorted array elements.

Note that some of the environment variables in bash are arrays; DIRSTACK functions as a stack for the pushd and popd built-ins, BASH_VERSINFO is an array of version information for the current instance of the shell, and PIPESTATUS is an array of exit status values for the last foreground pipe that was executed.

We'll see a further use of arrays when we build a bash debugger in Chapter 9.

To end this chapter, here are some problems relating to what we've just covered:

  1. Improve the account ID script so that it checks whether the argument is a number. Also, add a test to print an appropriate message if the user ID doesn't exist.

  2. Make the script print out the username (field 5) as well. Hint: this isn't as easy as it sounds. A username can have spaces in it, causing the for loop to iterate on each part of the name.

  3. As mentioned earlier, the built-in versions of pushd and popd use an array to implement the stack. Change the pushd, popd, and getNdirs code that we developed in this chapter so that it uses arrays.

  4. Change the selection sort in the last task into a bubble sort. A bubble sort works by iterating over the list comparing pairs of elements and swapping them if they are in incorrect order. It then repeats the process from the start of the list and continues until the list is traversed with no swaps.

    Previous Section  < Day Day Up >  Next Section