1.6. Files

Although arguments to commands aren't always files, files are the most important types of "things" on any UNIX system. A file can contain any kind of information, and indeed there are different types of files. Three types are by far the most important:

Regular files: Also called text files; these contain readable characters. For example, this book was created from several regular files that contain the text of the book plus human-readable formatting instructions to the troff word processor.
Executable files: Also called programs; these are invoked as commands. Some can't be read by humans; others—the shell scripts that we'll examine in this book—are just special text files. The shell itself is a (non-human-readable) executable file called bash.
Directories: These are like folders that contain other files—possibly other directories (called subdirectories).

1.6.1. Directories

Let's review the most important concepts about directories. The fact that directories can contain other directories leads to a hierarchical structure, more popularly known as a tree, for all files on a UNIX system.

Figure 1-1 shows part of a typical directory tree; rectangles are directories and ovals are regular files.

Figure 1-2. A tree of directories and files

The top of the tree is a directory called root that has no name on the system.^[6] All files can be named by expressing their location on the system relative to root; such names are built by listing all of the directory names (in order from root), separated by slashes (/), followed by the file's name. This way of naming files is called a full (or absolute) pathname.

^[6] Most UNIX tutorials say that root has the name /. We stand by this alternative explanation because it is more logically consistent with the rest of the UNIX filename conventions.

For example, say there is a file called aaiw that is in the directory book, which is in the directory cam, which is in the directory home, which is in the root directory. This file's full pathname is /home/cam/book/aaiw.

1.6.1.1 The working directory

Of course, it's annoying to have to use full pathnames whenever you need to specify a file. So there is also the concept of the working directory (sometimes called the current directory), which is the directory you are "in" at any given time. If you give a pathname with no leading slash, then the location of the file is worked out relative to the working directory. Such pathnames are called relative pathnames; you'll use them much more often than full pathnames.

When you log in to the system, your working directory is initially set to a special directory called your home (or login) directory. System administrators often set up the system so that everyone's home directory name is the same as their login name, and all home directories are contained in a common directory under root.

For example, /home/cam is a typical home directory. If this is your working directory and you give the command lp memo, then the system looks for the file memo in /home/cam. If you have a directory called hatter in your home directory, and it contains the file teatime, then you can print it with the command lp hatter/teatime.

1.6.1.2 Tilde notation

As you can well imagine, home directories occur often in pathnames. Although many systems are organized so that all home directories have a common parent (such as /home or /users), you should not rely on that being the case, nor should you even have to know the absolute pathname of someone's home directory.

Therefore, bash has a way of abbreviating home directories: just precede the name of the user with a tilde (~). For example, you could refer to the file story in user alice's home directory as ~alice/story. This is an absolute pathname, so it doesn't matter what your working directory is when you use it. If alice's home directory has a subdirectory called adventure and the file is in there instead, you can use ~alice/adventure/story as its name.

Even more convenient, a tilde by itself refers to your own home directory. You can refer to a file called notes in your home directory as ~/notes (note the difference between that and ~notes, which the shell would try to interpret as user notes's home directory). If notes is in your adventure subdirectory, then you can call it ~/adventure/notes. This notation is handiest when your working directory is not in your home directory tree, e.g., when it's some system directory like /tmp.

1.6.1.3 Changing working directories

If you want to change your working directory, use the command cd. If you don't remember your working directory, the command pwd tells the shell to print it.

cd takes as an argument the name of the directory you want to become your working directory. It can be relative to your current directory, it can contain a tilde, or it can be absolute (starting with a slash). If you omit the argument, cd changes to your home directory (i.e., it's the same as cd ~ ).

Table 1-1 gives some sample cd commands. Each command assumes that your working directory is /home/cam just before the command is executed, and that your directory structure looks like Figure 1-1.

Table 1-1. Sample cd commands

Command

New working directory

cd book

/home/cam/book

cd book/wonderland

/home/cam/book/wonderland

cd ~/book/wonderland

/home/cam/book/wonderland

cd /usr/lib

/usr/lib

cd ..

/home

cd ../gryphon

/home/gryphon

cd ~gryphon

/home/gryphon

The first four are straightforward. The next two use a special directory called .. (two dots), which means "parent of this directory." Every directory has one of these; it's a universal way to get to the directory above the current one in the hierarchy—which is called the parent directory.^[7]

^[7] Each directory also has the special directory . (single dot), which just means "this directory." Thus, cd . effectively does nothing. Both . and .. are actually special hidden files in each directory that point to the directory itself and to its parent directory, respectively. root is its own parent.

Another feature of bash's cd command is the form cd -, which changes to whatever directory you were in before the current one. For example, if you start out in /usr/lib, type cd without an argument to go to your home directory, and then type cd -, you will be back in /usr/lib.

1.6.2. Filenames, Wildcards, and Pathname Expansion

Sometimes you need to run a command on more than one file at a time. The most common example of such a command is ls, which lists information about files. In its simplest form, without options or arguments, it lists the names of all files in the working directory except special hidden files, whose names begin with a dot (.).

If you give ls filename arguments, it will list those files—which is sort of silly: if your current directory has the files duchess and queen in it and you type ls duchess queen, the system will simply print those filenames.

Actually, ls is more often used with options that tell it to list information about the files, like the -l (long) option, which tells ls to list the file's owner, size, time of last modification, and other information, or -a (all), which also lists the hidden files described above. But sometimes you want to verify the existence of a certain group of files without having to know all of their names; for example, if you use a text editor, you might want to see which files in your current directory have names that end in .txt.

Filenames are so important in UNIX that the shell provides a built-in way to specify the pattern of a set of filenames without having to know all of the names themselves. You can use special characters, called wildcards, in filenames to turn them into patterns. Table 1-2 lists the basic wildcards.

Table 1-2. Basic wildcards

Wildcard

Matches

?

Any single character

*

Any string of characters

[set]

Any character in set

[!set]

Any character not in set

The ? wildcard matches any single character, so that if your directory contains the files program.c, program.log, and program.o, then the expression program.? matches program.c and program.o but not program.log.

The asterisk (*) is more powerful and far more widely used; it matches any string of characters. The expression program.* will match all three files in the previous paragraph; text editor users can use the expression *.txt to match their input files.^[8]

^[8] MS-DOS and VAX/VMS users should note that there is nothing special about the dot (.) in UNIX filenames (aside from the leading dot, which "hides" the file); it's just another character. For example, ls * lists all files in the current directory; you don't need *.* as you do on other systems. Indeed, ls *.* won't list all the files—only those that have at least one dot in the middle of the name.

Table 1-3 should help demonstrate how the asterisk works. Assume that you have the files bob, darlene, dave, ed, frank, and fred in your working directory.

Table 1-3. Using the * wildcard

Expression

Yields

fr*

frank fred

*ed

ed fred

b*

bob

*e*

darlene dave ed fred

*r*

darlene frank fred

*

bob darlene dave ed frank fred

d*e

darlene dave

g*

g*

Notice that * can stand for nothing: both *ed and *e* match ed. Also notice that the last example shows what the shell does if it can't match anything: it just leaves the string with the wildcard untouched.

The remaining wildcard is the set construct. A set is a list of characters (e.g., abc), an inclusive range (e.g., a-z), or some combination of the two. If you want the dash character to be part of a list, just list it first or last. Table 1-4 should explain things more clearly.

Table 1-4. Using the set construct wildcards

Expression

Matches

[abc]

a, b, or c

[.,;]

Period, comma, or semicolon

[-_]

Dash or underscore

[a-c]

a, b, or c

[a-z]

All lowercase letters

[!0-9]

All non-digits

[0-9!]

All digits and exclamation point

[a-zA-Z]

All lower- and uppercase letters

[a-zA-Z0-9_-]

All letters, all digits, underscore, and dash

In the original wildcard example, program.[co] and program.[a-z] both match program.c and program.o, but not program.log.

An exclamation point after the left bracket lets you "negate" a set. For example, [!.;] matches any character except period and semicolon; [!a-zA-Z] matches any character that isn't a letter. To match ! itself, place it after the first character in the set, or precede it with a backslash, as in [\!].

The range notation is handy, but you shouldn't make too many assumptions about what characters are included in a range. It's safe to use a range for uppercase letters, lowercase letters, digits, or any subranges thereof (e.g., [f-q], [2-6]). Don't use ranges on punctuation characters or mixed-case letters: e.g., [a-Z] and [A-z] should not be trusted to include all of the letters and nothing more. The problem is that such ranges are not entirely portable between different types of computers.^[9]

^[9] Specifically, ranges depend on the character encoding scheme your computer uses (normally ASCII, but IBM mainframes use EBCDIC) and the character set used by the current locale (ranges in languages other than English may not give expected results).

The process of matching expressions containing wildcards to filenames is called wildcard expansion or globbing. This is just one of several steps the shell takes when reading and processing a command line; another that we have already seen is tilde expansion, where tildes are replaced with home directories where applicable. We'll see others in later chapters, and the full details of the process are enumerated in Chapter 7.

However, it's important to be aware that the commands that you run only see the results of wildcard expansion. That is, they just see a list of arguments, and they have no knowledge of how those arguments came into being. For example, if you type ls fr* and your files are as on the previous page, then the shell expands the command line to ls fred frank and invokes the command ls with arguments fred and frank. If you type ls g*, then (because there is no match) ls will be given the literal string g* and will complain with the error message, g*: No such file or directory.^[10]

^[10] This is different from the C shell's wildcard mechanism, which prints an error message and doesn't execute the command at all.

Here is an example that should help make things clearer. Suppose you are a C programmer. This means that you deal with files whose names end in .c (programs, also known as source files), .h (header files for programs), and .o (object code files that aren't human-readable), as well as other files. Let's say you want to list all source, object, and header files in your working directory. The command ls *.[cho] does the trick. The shell expands *.[cho] to all files whose names end in a period followed by a c, h, or o and passes the resulting list to ls as arguments. In other words, ls will see the filenames just as if they were all typed in individually—but notice that we required no knowledge of the actual filenames whatsoever! We let the wildcards do the work.

The wildcard examples that we have seen so far are actually part of a more general concept called pathname expansion. Just as it is possible to use wildcards in the current directory, they can also be used as part of a pathname. For example, if you wanted to list all of the files in the directories /usr and /usr2, you could type ls /usr*. If you were only interested in the files beginning with the letters b and e in these directories, you could type ls /usr*/[be]* to list them.

1.6.3. Brace Expansion

A concept closely related to pathname expansion is brace expansion. Whereas pathname expansion wildcards will expand to files and directories that exist, brace expansion expands to an arbitrary string of a given form: an optional preamble, followed by comma-separated strings between braces, and followed by an optional postscript. If you type echo b{ed,olt,ar}s, you'll see the words beds, bolts, and bars printed. Each instance of a string inside the braces is combined with the preamble b and the postscript s. Notice that these are not filenames—the strings produced are independent of filenames. It is also possible to nest the braces, as in b{ar{d,n,k},ed}s. This will result in the expansion bards, barns, barks, and beds.

You can also use a slightly different type of brace expansion for creating a sequence of letters or numbers. If you type echo {2..5} you'll see this expands to 2 3 4 5. Typing echo {d..h} results in the expansion d e f g h.^[11]

^[11] This form of brace expansion is not available in bash prior to Version 3.0.

Brace expansion can also be used with wildcard expansions. In the example from the previous section where we listed the source, object, and header files in the working directory, we could have used ls *.{c,h,o}.^[12]

^[12] This differs slightly from C shell brace expansion. bash requires at least one unquoted comma to perform an expansion; otherwise, the word is left unchanged, e.g., b{o}lt remains as b{o}lt.

< Day Day Up >