Beginners Guide to Unix Shell Programming

Beginners Guide to Unix Shell Programming




This is from http://www.umr.edu/~mchilds/

Regular Expressions and Tools of the Trade

Unix Shell Programming

Passing Arguments to Shell Programs

Conditional Branching

Looping

Reading Data

Your Environment

More on Parameters

REGULAR EXPRESSIONS AND TOOLS OF THE TRADE


  • Regular Expressions

    • Provide a convenient and consistent way of specifying patterns to be matched
    • Match Any Character: The Period (.)
      • a period in a regular expression matches any single character, no matter what it is
    • Matching the Beginning of a Line: ^
      • when the caret character (^) is used as the first character in a regular expression, it matches the beginning of the line
    • Matching the End of a Line: $
      • the dollar sign ($) is used to match the end of a line
    • Matching Special Characters
      • in general, to match any of the characters that have a special meaning in forming regular expressions, you must precede the character by a backslash () to remove that special meaning
    • Matching a Choice of Characters: The [...] Construct
      • the characters [ and ] can be used in a regular expression to specify that one of the enclosed characters is to be matched
      • a range of characters can be specified within the brackets by separating the starting and ending characters with a -`
    • Special Characters in the Replacement String
      • in general, regular expression special characters are only meaningful in the search string and have no special meaning when they appear in the replacement string
    • Match Inverting Withing the [...] Construct: ^
      • if a caret (^) appears as the first character after the left bracket ([), then the sense of the match is inverted
    • Matching Zero of More Characters: *
      • in regular expressions, the asterick is used to match zero or more occurences of the preceding character in the regular expression
    • Matching a Precise Number of Characters: {...}
      • by using the construct {min,max} where min specifies the minimum number of occurences of the preceding regular expression to be matched, and max specifies the maximum, one can specify a precise number of characters to be matched
      • if only one number is enclosed then that number specifies that the preceding regular expression must be matched exactly that many times
      • if a single number is enclosed in the braces, followed immediately by a comma, then at least that many occurrences of the previous regular expression must be matched
    • Saving Matched Characters: (...)
      • by enclosing characters inside backslashed parentheses, it is possible to capture those characters that are matched and store them in registers numbered 1 through 9
      • to retrieve the characters in a particular register, the construct is used, where n is from 1-9

  • The cut Command

    • Command is useful when you need to extract various fields of data from a data file orthe output of a command
    • General Form: cut -c chars file
      • chars specifies what characters you want to extract from each line of a file
      • if file is not specified, cut reads its input from standard input
    • The -d and -f Options
      • General Form: cut -d dchar -f fields file
        • dchar is the character that delimits each field of the data
        • fields specifies the fields to be extracted from the file
        • if -f is used and -d is not, then the tab character is used as a delimiter

  • The paste Command

    • Sort of the inverse of cut; instead of breaking lines apart, it puts them together
    • General Form: paste files
      • corresponding lines from each file in files are pasted together to form single lines which are then written to standard output
      • the dash character (-) can be used in files to specify that input is from standard input
    • The -d Option
      • use if you dont want fields separated by tab characters
      • General Form: paste -d chars files
        • chars is one or more characters that will be used to separate the lines that are pasted together
        • the first character in chars will be used to separate lines from the first and second files, the second character in chars to separate the second and third, etc.
        • if there are more files than chars, then paste wraps around the list of characters
        • its safest to enclose the delimeter characters in single quotes (chars)
    • The -s Option
      • tells paste to paste together lines from the same file
      • if just one file is specified, then the effect is to merge all lines from the file together, separated by tabs, or by the delimeter characters specified by the -d option

  • The sed Command

    • sed is a program used for editing data. It stands for stream editor. Unlike ed, it cannot be used interactively.
    • General Form: sed command file
      • command is and ed-style command that is applied to each line of the specified file, if no file is specified then standard input is assumed
    • The -n Option
      • tells sed that you dont want it to print any lines unless explicitly told to do so
    • Deleting Lines
      • used to delete entire lines of text
      • by specifying a line number or a range of numbers you can delete specific lines from the input

  • The tr Command

    • The tr filter is used to translate characters from standard input.
    • General Form: tr from-chars to-chars
      • from-chars and to-chars are one or more single characters
      • any character in from-chars encountered on the input will be translated into the corresponding character in to-chars
      • result of translation is written to standard output
      • you can also give the octal representation of a character in the form nn where nnn is the octal representation of the character
    • The -s Option
      • use to try and squeeze out multiple occurences of characters in to-chars
      • if more than one consecutive occurrence of a character specified in to-chars occurs after the translation is made, the characters will be replaced by a single character
    • The -d Option
      • use to delete single characters from a stream of input
      • General Form: tr -d from-chars
        • any single character in from-chars will be deleted from standard input

  • The grep Command

    • Allows you to search one or more files for particular character patterns.
    • General Form: grep pattern files
      • every line of each file that contains pattern is displayed at the terminal
      • if more than one file is specified to grep, then each line is also immediately preceded by the name of the file
      • its generally a good idea to enclose your grep pattern inside a pair of single quotes
      • grep takes its input from standard input if no file name is specified
      • grep allows you to specify your pattern using regular expressions
    • The -v Option
      • use when youre interested in finding lines that dont contain a specified pattern
    • The -l Option
      • use when youre only interested in knowing the names of the files which contain the specified pattern
    • The -n Option
      • use to show line numbers that pattern is in

  • The sort Command

    • By default, sort takes each line of the specified input file and sorts it into ascending order
    • The -u Option
      • tells sort to eliminate duplicate lines from the output
    • The -r Option
      • use to reverse the order of the sort
    • The -o Option
      • use to specify an output file
      • list the name of the output file right after the -o option
      • usually used when you want to sort the lines in a file and have the sorted data replace the original
    • The -n Option
      • specifies that the first field on a line is to be considered numeric
    • Skipping Fields
      • you can skip fields by using the +num option where num is the number of fields to skip
    • The -t Option
      • use to tell sort that the field delimeter character is something other than the space or tab character

  • The uniq Command

    • The uniq command is useful when you need to find duplicate lines in a file.
    • General Form: uniq in_file out_file
      • in this format, uniq will copy in_file to out_file, removing any duplicate lines in the process
      • uniqs definition of duplicated lines are consecutive-occurring lines that match exactly
    • The -d Option
      • tells uniq to write only the duplicated lines to out_file
    • The -c Option
      • removes duplicate lines and precedes each output line with a count of the number of times the line occurred in the input

UNIX SHELL PROGRAMMING


  • Executing Files

    • To execute a script file just type the name of the program at the prompt
    • Remember that you must have both read and execute permissions for a file before you canexecute it

  • Comments

    • Whenever the shell encounters the special character # at the start of a word, it takeswhatever characters follow the # to the end of the line as comments

  • Variables

    • A shell variable begins with an alphabetic or underscore character, and is followed by zeroor more alphanumeric or underscore characters
    • To store a value inside a shell variable, you simply write the name of the variable, followedby the equals sign, followed immediately by the value you want to store in the variable(i.e. variable=value)
    • Spaces are not permitted on either side of the equals sign
    • The shell has no concept of data types
    • Variables are not declared before theyre used

  • Displaying the Value of Variables

    • The echo command is used to display the value that is stored inside a shell variable
    • The dollar sign ($) is a special character in the shell. If a valid variable name follows thedollar sign, then the shell takes this as an indication that the value in the variable is tobe used

  • The Null Value

    • A variable that contains no value is said to contain the null value
    • To set a variable to the null value you simply assign no value to the variable or you can listadjacent pairs of quotes ( or "") after the =

  • File Name Substitution and Variables

    • The shell does not perform file name substitution when assigning values to variables

  • The ${variable} Construct

    • Use when you want something immediately after the variable name

  • Quotes

    • The Single Quote
      • use to keep characters that are otherwise separated by whitespace characters together
      • when the shell sees the first single quote, it ignores any otherwise special characters that follow until it sees the closing quote
      • the shell removes the quotes from the command line and does not pass them to the program
      • quotes are also needed when assigning values containing whitespace or special characters to shell variables
    • The Double Quote
      • ignores all enclosed characters except dollar signs ($), back quotes (`), and backslashes ()
      • variable name substitution is done by the shell inside double quotes
      • file name substitution is not done inside double quotes
      • double quotes can be used to hide single quotes from the shell, and vice versa
    • The Backslash
      • the backslash quotes the single character that immediately follows it
      • General Form: c
        • where c is the character you want to quote
      • any special meaning normally attached to the character is removed
    • The Back Quote
      • purpose is to tell the shell to execute the enclosed command and to insert the standard output from the command at that point on the command line
      • General Form: `command`
        • where command is the name of the command to be executed and whose output is to be inserted at that point
      • you are not restricted to a single command inside back quotes
      • the shell does file name substitution after it substitutes the output from the back-quoted commands

  • Arithmetic on Shell Variables

    • A Unix command called expr evaluates an expression given to it on the command line
    • Each operator and operand given to expr must be a separate argument
    • The usual arithmetic operators (+.-,*,/,%) are recognized by expr
    • Remember to use backslashes to protect the expression from the shell
    • expr only evaluates integer arithmetic expressions
    • Use the : operator with expr to match characters in the first operand againsta regular expression given as the second argument; by default it returns the number ofcharacters matched

PASSING ARGUMENTS TO SHELL PROGRAMS

  • Positional Parameters

    • Whenever you execute a shell program, the shell automatically stores the first argument in thespecial shell variable 1, the second argument in the variable 2, and so on; these specialvariables are known as positional parameters and are assigned after the shell has done itsnormal command line processing.

  • The $# Variable

    • Whenever you execute a shell program, the special shell variable $# gets set to the number of arguments that weretyped on the command line

  • The $* Variable

    • The special variable $* references all of the arguments passed to the program
    • Often useful in programs that take an indeterminate or variable number of arguments

  • The shift Command

    • The shift command allows you to effectively left shift your positional parameters
    • When shift is executed, $# is automatically decremented by one
    • You can shift more than one place by adding a count after shift (i.e. shift 3)

CONDITIONAL BRANCHING


  • The if Statement

    • The if statement enables you to test a condition and then change the flow of program execution basedupon the result of the test.
    • General Form:
              if command1        thencommandcommand...        fi
      
      • where commands is executed and its exit status is tested, if the exit status is zero, then the commands that follow between the then and the fi are executed; otherwise they are skipped

  • Exit Status

    • Whenever any program completes execution under the UNIX system, it returns an exit status back to the system
    • The exit status is a number that usually indicates whether a program successfully ran or not
    • An exit status of zero is used to indicate that a program succeded and nonzero to indicate failure
    • Failures can be caused by invalid arguments passed to the program, or by an error condition that is detected by the program
    • grep returns an exit status of zero if it finds the specified pattern in at least one of the files

  • The $? Variable

    • Automatically set to the exit status of the last command executed

  • The /dev/null Device

    • Send your output here if you dont want to see it

  • The test Command

    • Often used for testing one or more conditions in an if command
    • General Form: test expression
      • where expression represents the condition that youre testing
    • test evaluates expression, and if the result is true, it returns an exit status of zero;otherwise the result is false, and it returns a nonzero exit status
    • Alternate Form: [ expression ]
      • spaces must appear before and after the brackets
    • test must see all operands as arguments, meaning that they must be delimited by whitespace

  • test String Operators

                         Table of test String Operators----------------------+---------------------------------------------------       Operator       | Returns TRUE (exit status of zero) if----------------------+---------------------------------------------------   string1 = string2  | string1 is identical to string2  string1 != string2  | string1 is not identical to string2       string         | string is not null      -n string      | string is not null      -z string       | string is null (and string must be seen by test----------------------+---------------------------------------------------   
    

  • test Integer Operators

                        Table of test Integer Operators----------------------+---------------------------------------------------       Operator       | Returns TRUE (exit status of zero) if----------------------+---------------------------------------------------    int1 -eq int2     | int1 is equal to int2    int1 -ge int2     | int1 is greater than or equal to int2    int1 -gt int2     | int1 is greater than int2    int1 -le int2     | int1 is less than or equal to int2    int1 -lt int2     | int1 is less than int2    int1 -ne int2     | int1 is not equal to int2----------------------+---------------------------------------------------
    

  • test File Operators

    • Each file operator is unary in nature, meaning they expect a single argument to follow
                         Table of test File Operators-----------------------+---------------------------------------------------      Operator         | Returns TRUE (exit status of zero) if-----------------------+---------------------------------------------------      -d file          | file is a directory      -f file          | file is an ordinary file      -r file          | file is readable by the process      -s file          | file has nonzero length      -w file          | file is writeable by the process      -x file          | file is executable-----------------------+---------------------------------------------------
    

  • The Logical Negation Operator (!)

    • The unary logical negation operator (!) can be placed in front of any other test expression tonegate the result of the evaluation of that expression

  • The Logical AND Operator (-a)

    • The operator -a performs a logical AND of two expressions and returns true only if the two joinedexpressions are both true
    • The -a operator has a lower precedence than the integer, file, and string operators

  • The Logical OR Operator (-o)

    • Forms a logical OR of two expressions

  • Parentheses

    • You can use parentheses in a test expression to alter the order of evaluation
    • Make sure the parentheses are quoted or backslashed to remove their special meaning
    • Spaces must surround the parentheses

  • The else Construct

    • General Form:
          if command_t    then  command          command          . . .    else  command          command          . . .    fi   
      
    • Executes the else block if the exit status of command_t is non-zero

  • The exit Command

    • Enables you to immediately terminate execution of your shell program
    • General Form: exit n
      • where n is the exit status that you want returned
      • if n is not specified, then the exit status used is that of the last command executed before the exit

  • The elif Construct

    • Use in place of nested else...if blocks
    • General Form:
           if command1     thencommandcommand...     elif command2     thencommandcommand...     ...     elif command_n     thencommandcommand...     elsecommandcommand...     fi 
      
    • Note that only one fi is used with the elif construct

  • The case Command

    • Allows you to compare a single value against other values and to execute one or more commands when a match is found
    • General Form:
         case value in pat1)  commandcommand...command;; pat2)  commandcommand...command;; ... patn)  commandcommand...command;;   esac
      
    • The word value is successively compared against the values pat1, pat2,...,, patn until a match is found
    • Execution of the case is terminated once a double semicolon is reached
    • If a match is not found then none of the commands are executed
    • Special Pattern Matching Characters
      • the shell lets you use the same special characters for specifying the patterns in a case as you can with file name substitution
      • you can use ?, *, or [...]
    • Logical OR in the case Construct
      • the symbol | has the effect of a logical OR when used between two patterns
      • General Form: pat1 | pat2
      • specifies that either pat1 or pat2 is to be matched

  • The Null Command

    • purpose is to do nothing
    • General Form: :
    • use to satisfy requirements that a command appear

  • The & & and || Constructs

    • These constructs enable you to execute a command based on whether or not the previous command succeeds or fails
    • General Form: command1 && command2
      • execute command2 if command1 succeeds
    • General Form: command1 || command2
      • execute command2 if command1 fails
    • Can be used like logical operators in if statements

LOOPING


  • The for Command

    • Used to execute a set of commands a specified number of times
    • General Form:
          for var in word1 word2 ... wordn    docommandcommand...    done
      
      • the commands enclosed between the do and the done form whats known as the body of the loop
      • when the loop is executed, the first word, word1, is assigned to var
      • the next time, word2 is substituted for var and so on
      • the shell permits file name substitution in the list of words in the for loop
    • The $@ Variable
      • use instead of $* when some of the arguments are in quotes
      • use double quotes around $@
    • The for Without the List
      • the shell automatically sequences through all arguments in the command line if no list is specified for the for loop

  • The while Command

    • General Form:
          while command_t    docommandcommand...    done 
      
      • command_t is executed and its exit status tested; if its zero, then the commands between the do and the done execute; then command_t is executed again and its exit status tested and so on
    • Often used in conjunction with the shift command to process a variable number ofcommand line arguments

  • The until Command

    • Similar to the while command, only it continues to execute so long as the command thatfollows the until returns a nonzero exit status
    • General Form:
          until command_t    docommandcommand...    done 
      

  • Breaking Out of a Loop

    • To make an immediate exit from a loop, use the break command
    • If the break command is used in the form break n, then the ninnermost loops are skipped

  • Skipping the Remaining Commands in a Loop

    • The continue command causes the remaining commands in a loop to be skipped
    • If the continue command is used in the form continue n, then the innermostn loops are skipped

  • Executing a Loop in the Background

    • An entire loop can be sent to the background for execution by placing an ampersand (&)after the done

  • I/O Redirection on a Loop

    • You can redirect the I/O of a loop by placing the redirection after the done
    • Input redirected into the loop applies to all commands in the loop that read their data fromstandard input
    • Output redirected from the loop applies to all commands in the loop that write to standardoutput
    • You can override redirection of the entire loops input or output by explicitly redirecting the input and/or output of commands inside the loop
    • To force input or output of a command to come from or go to the terminal, use the fact that/dev/tty always refers to your terminal
    • You can also redirect standard error by appending 2>file after the done

  • The getopts Command

    • A built-in shell command that exists for the express purpose of processing command-linearguments
    • General Form: getopts options variable
      • designed to be executed inside a loop
      • examines the next command line argument and determines if it is a valid option by checking to see if the argument begins with a minus sign and is followed by any single letter contained inside options; if it is a valid option, getopts stores the matching option letter inside variable and returns a zero exit status
      • if the letter that follows the minus sign is not listed in options, getopts stores a question mark inside variable, returns a zero exit status, and sends a message to standard error
      • if there are no more arguments left on the command line or if the next argument doesnt begin with a minus sign, getopts returns a nonzero exit status
    • To indicate to getopts that an option takes a following argument, you write a coloncharacter after the option letter on the getopts command line; if getoptsdoesnt find an argument after an option that requires one, it will store a question mark inside the specified variable and will write an error message to standard error; otherwisethe actual argument is stored in a variable known as OPTARG

READING DATA


  • The Read Command

    • General Form: read variable(s)
      • when this command is executed, the shell reads a line from standard input and assigns the first word read to the first variable listed in variable(s), the second word to the second variable listed in variable(s), and so on
      • if there are more words on the line than there are variables listed, then the excess words get assigned to the last variable
      • read always returns an exit status of zero unless an end of file condition is detected in the input

  • Special echo Escape Characters

    Characters Specially Interpreted by echo---------------------------------------------------------------------------  Character  | Prints---------------------------------------------------------------------------           | backspace     c      | the line without a terminating newline     f      | formfeed     
          | newline     
          | carriage return     	      | tab character     \      | backslash character    nnn     | the character whose ASCII value is nnn, where nnn is             |     a one to three digit octal number that starts with zero---------------------------------------------------------------------------
    

YOUR ENVIRONMENT



  • Subshells

    • A subshell is an entirely new shell that is executed by your login shell in order to run the desired program
    • A subshell has no knowledge of local variables that were assigned values by the login shell
    • A subshell cannot change the value of a variable in the parent shell

  • Exporting Variables - The export Command

    • Makes the value of a variable known to a subshell
    • General Form: export variable(s)
      • where variable(s) is the list of variable names that you want exported
    • Once a variable is exported, it remains exported to all subshells that are subsequentlyexecuted

  • PS1 and PS2

    • PS1 contians your command prompt
    • PS2 contains your secondary command prompt

  • The . Command

    • General Form: . file
      • executes file in the current shell

  • The exec Command

    • You can use the exec command to replace the current program with a new one
    • General Form: exec program
      • where program is the name of the program to be executed
    • You can redirect standard input by using exec <</tt>file; any commands thatsubsequently read data from standard input will read from file; the same can bedone for standard output

  • The (...) and {...;} Constructs

    • Use to group commands together
    • Use (...) to execute the commands in a subshell and {...;} to executethem in the current shell
    • Useful for sending a group of commands to the background to be executed in order

  • Another Way to Pass Variables to a Subshell

    • To send the value of a variable to a subshell, you can precede the name of the command withthe assignment of as many variables as you want

MORE ON PARAMETERS


  • Parameter Substitution

    • ${parameter}
      • if theres a potential conflict caused by the characters that follow the parameter name, then you can enclose the name inside curly brackets
    • ${parameter:-value}
      • the constant says to substitute the value of parameter if its not null, and to substitute value otherwise
    • ${parameter:=value}
      • substitutes the value of parameter if its not null, assigns value to it otherwise
    • ${parameter:?value}
      • if parameter is not null, the shell substitutes its value; otherwise, the shell writes value to standard error and then exits
    • ${parameter:+value}
      • substitutes value if parameter is not null; otherwise it substitutes nothing

  • The $0 Variable

    • When you execute a shell program, the shell automatically stores the name of the program insidethe special variable $0

  • The set Command

    • A dual-purpose command used both to set various shell options as well as to reassign thepositional parameters
    • The -x Option
      • this option turns on trace mode in the shell
      • after the set -x command is executed, all subsequently executed commands will be printed by the shell, after file name, variable, and command substitution and I/O redirection have been performed
      • you can turn trace off at any time by simply executed set with the +x option
    • set with no arguments
      • if you dont give any arguments to set, youll get an alphabetized list of all of the variables that exist in your environment
    • Using set to reassign positional parameters
      • the only way that positional parameters can be changed is with the shift or set commands
      • if words are given as arguments to set on the command line, then the positional parameters $1, $2, ... will be assigned to these words
    • The -- Option
      • the -- option tells set not to interpret any subsequent arguments on the command line as options