OS Lab 5 - Bash Scripting

De la WikiLabs
Versiunea din 31 octombrie 2025 14:58, autor: Vserbu (discuție | contribuții)
(dif) ← Versiunea anterioară | Versiunea curentă (dif) | Versiunea următoare → (dif)
Jump to navigationJump to search

Objectives

Upon completion of this lab, you will be able to:

  • Explain how the kernel executes scripts via the shebang mechanism and how bash interprets script contents.
  • Distinguish between shell builtins and external programs, understanding why certain commands must be builtins.
  • Write scripts using proper variable quoting, recognizing that commands are fundamentally lists of strings.
  • Use exit status to control script flow with conditionals and logical operators.
  • Redirect file descriptors to control input and output streams.
  • Implement loops with while and for, incorporating command substitution.
  • Define functions with proper variable scoping and work with arrays.
  • Apply defensive scripting practices using set -euo pipefail.

Introduction

In previous labs, we explored processes, job control, and the permission model. These concepts form the foundation for understanding how programs execute and interact with the system. We now turn to bash scripting, which allows us to automate tasks and build tools without writing compiled programs.

This lab emphasizes underlying mechanisms rather than syntax alone. We will examine how the kernel interprets shebangs, why certain commands must be shell builtins, and how bash's command model treats everything as lists of strings. Understanding these foundations prepares you for the next lab, where we'll explore signals and inter-process communication using bash scripts rather than C code.

Prerequisites

System Requirements

A running instance of the course-provided Linux virtual machine with SSH or direct terminal access.

Required Packages

All necessary utilities (bash, seq, grep, wc) are pre-installed. No additional packages are required.

Knowledge Prerequisites

You should be familiar with:

  • Basic command-line navigation and file operations
  • Process concepts from Lab 3 (PIDs, exit status)
  • File permissions and the execute bit from Lab 4

Bash Scripting

Shebangs and Script Execution

The Kernel's Role

When you execute a file like ./script.sh, the kernel first checks the execute permission bit. If set, the kernel examines the file's first bytes. If they are #! (the "shebang"), the kernel reads the rest of that line as a path to an interpreter and re-executes the file using that interpreter.

For example, a script beginning with #!/bin/bash causes the kernel to execute /bin/bash ./script.sh. This mechanism is general-purpose: the kernel doesn't care whether the interpreter is bash, Python, or any other program.

Requirements for script execution:

  • Execute bit must be set (chmod +x script.sh)
  • File must begin with a valid shebang pointing to an interpreter
  • The interpreter itself must be an executable binary

Without the execute bit, the kernel's permission check fails. Without a shebang, the kernel cannot determine which interpreter to use, and behavior becomes system-dependent.

The Shell's Role

Once the kernel launches bash with your script as an argument, bash reads the script line by line, parsing each line according to its syntax rules. This involves expanding variables, performing word splitting, executing commands, and capturing results.

The fundamental principle: every command is a list of strings. When bash executes echo Hello World, it constructs a list of three strings: "echo", "Hello", and "World". The first string identifies the command; remaining strings are arguments. This model applies universally to builtins, external programs, and functions.

Builtins vs External Programs

External Programs and PATH

When bash encounters a command that isn't a builtin, it searches the directories in the PATH environment variable for an executable with that name. When found, bash forks a child process and executes the program. Commands like ls, grep, and wc run in separate processes with their own address space.

Use which to locate external programs:

which ls
# Output: /usr/bin/ls

Shell Builtins

Builtins are commands built directly into bash. Use type to identify them:

type cd
# Output: cd is a shell builtin

Builtins exist for two reasons:

1. Process State Modification

Some commands must modify the shell's own process state. Consider cd: if it were an external program, it would run in a child process, change that child's directory, then exit. The parent shell's directory would remain unchanged because child process state doesn't propagate to parents. For cd to work, it must execute within the shell's own process, making it a necessary builtin.

Other state-modifying builtins include ulimit (resource limits) and umask (permission mask). In the next lab, we'll see export (environment variables) and trap (signal handlers).

2. Performance

Commands like true, false, echo, and [[ are builtins for performance, avoiding process creation overhead. They could theoretically be external programs, but making them builtins is faster.

Exercise A: Your First Script and the Shebang

This exercise demonstrates the kernel's shebang mechanism and the importance of the execute bit.

Steps:

  1. Create a script /tmp/lab5_hello.sh containing:

    #!/bin/bash
    echo "Hello from bash!"
    
  2. Check the file's permissions with ls -l. Note that the execute bit is not set.

  3. Attempt to execute the script directly: /tmp/lab5_hello.sh. This should fail with "Permission denied".

  4. Make the script executable with chmod +x and verify with ls -l.

  5. Execute the script directly. It should now succeed.

  6. Execute the script by passing it to bash: bash /tmp/lab5_hello.sh. Observe that this works even without the execute bit.

  7. Create a second script without a shebang, make it executable, and attempt to run it. Observe the unpredictable behavior.

  8. Clean up both test scripts.

Deliverable A: Provide ls -l output before and after setting the execute bit, the "Permission denied" error, and the successful execution output.

Exercise B: Builtins vs External Commands

This exercise demonstrates why certain commands must be builtins.

Steps:

  1. Use type to identify whether cd, echo, ls, [[, and grep are builtins or external commands.
  2. Use which to locate external programs. Try which cd and note that it doesn't find the builtin.
  3. Create /tmp/lab5_cd_test.sh that prints the current directory, changes to /tmp, and prints the directory again.
  4. Make the script executable and run it. Observe that the script successfully changes to /tmp.
  5. Check your shell's current directory with pwd. Note that your shell is still in the original directory.
  6. Test cd directly in your shell to confirm it works when run as a builtin.
  7. Clean up the test script.

Deliverable B: Provide the type command outputs, the script output followed by your shell's pwd output.

Variables and Quoting

The List-of-Strings Model

Variables in bash are expanded before commands execute. How they're expanded determines the final command's argument list.

Consider:

greeting="Hello World"
echo $greeting

Bash expands $greeting to "Hello World", then performs word splitting, breaking it into separate strings at whitespace. The result: three strings passed to echo: "echo", "Hello", "World".

This works fine for echo, but consider:

filename="my document.txt"
ls $filename

Word splitting breaks "my document.txt" into "my" and "document.txt". The ls command receives two arguments and tries to list two separate files, which fails.

Quoting Fixes This

Double quotes prevent word splitting:

ls "$filename"

Now ls receives one argument: "my document.txt". The expansion happens, but the result stays as a single string.

Single quotes prevent all expansion:

echo '$name'    # Outputs: $name (literal)
echo "$name"    # Outputs: the value of $name

Best practice: Always quote variables unless you specifically need word splitting. Write "$variable" not $variable. This makes scripts robust when variables contain spaces or special characters.

Exercise C: Quoting and Word Splitting

This exercise demonstrates why quoting variables is essential.

Steps:

  1. Create a file named /tmp/my test file.txt (with spaces).
  2. Store the filename in a variable and try to list it without quotes: ls $myfile. This will fail.
  3. Try again with proper quoting: ls "$myfile". This succeeds.
  4. Test echo with a variable containing multiple spaces, both with and without quotes.
  5. Create /tmp/lab5_quote_test.sh that demonstrates variable quoting and array expansion both correctly and incorrectly. The script should show:
    • A variable with spaces, echoed with and without quotes
    • An array containing an element with spaces, iterated both ways
  6. Make the script executable and run it. Observe how the "wrong way" processes four items instead of three.
  7. Clean up the test files.

Deliverable C: Provide the error from unquoted ls, the success from quoted ls, complete script output.

Exit Status and Conditionals

Every command returns an exit status: a number from 0 to 255. By convention, 0 means success; non-zero means failure. Bash stores the last command's exit status in $?:

ls /tmp
echo $?    # Outputs: 0 (success)

ls /nonexistent
echo $?    # Outputs: non-zero (failure)

This exit status is bash's fundamental true/false mechanism. An if statement executes a command and checks if it succeeded:

if grep -q "ERROR" /var/log/syslog; then
    echo "Errors found"
fi

If grep finds "ERROR", it returns 0, and the then block executes.

The [[ Command

The [[ builtin performs tests and returns an exit status. Despite unusual syntax, it's conceptually just another command:

[[ -f "file.txt" ]]    # Returns 0 if file exists, 1 otherwise

Common operators:

  • -f file: True if regular file exists
  • -d dir: True if directory exists
  • -e path: True if path exists
  • string1 == string2: True if strings equal
  • num1 -lt num2: True if num1 less than num2

Example:

if [[ -f "config.txt" ]]; then
    echo "Config file found"
fi

Logical Operators

&& (AND): Runs next command only if previous succeeded (returned 0):

mkdir /tmp/newdir && cd /tmp/newdir

|| (OR): Runs next command only if previous failed (returned non-zero):

cd /tmp/important || echo "Failed to change directory"

Explicit error ignoring:

rm -f /tmp/maybe-exists.txt || true

The || true pattern documents that you're intentionally ignoring potential errors.

Exercise D: Exit Status and Conditionals

This exercise explores exit status and conditional logic.

Steps:

  1. Execute ls /tmp and check $?. It should be 0 (success).
  2. Execute ls /nonexistent (redirect stderr to /dev/null) and check $?. It should be non-zero.
  3. Test -f /etc/passwd and -f /nonexistent , checking $? after each.
  4. Create /tmp/lab5_conditionals.sh that demonstrates:
    • File existence tests with if
    • The && operator for success chaining
    • The || operator for failure handling
    • The || true pattern for explicit error ignoring
    • Using command exit status directly in if (e.g., with grep)
    • Numeric and string comparisons with [[
  5. Make the script executable and run it. Observe the output from each test.
  6. Clean up the script.

Deliverable D: Provide exit status values from steps 1-3, complete script output.

Output Redirection and File Descriptors

Recall from Lab 3 that every process has file descriptors (FDs):

  • FD 0: Standard input (stdin)
  • FD 1: Standard output (stdout)
  • FD 2: Standard error (stderr)

Redirection allows you to control where these streams go.

Basic Redirection

Redirect stdout to a file:

echo "output" > file.txt

Redirect stderr:

ls /nonexistent 2> error.txt

Redirect both:

command &> combined.txt

Or more explicitly:

command > output.txt 2>&1

Order matters! 2>&1 must come after > output.txt to redirect stderr to where stdout is pointing.

Pipes

The pipe operator | connects one command's stdout to another's stdin:

cat /var/log/syslog | grep "ERROR" | wc -l

Data flows left to right. The kernel creates an anonymous pipe, connecting the write end to the first command's stdout and the read end to the second command's stdin.

Input Redirection

Redirect stdin from a file:

wc -l < file.txt

This differs from wc -l file.txt: with <, bash opens the file and connects it to stdin before running wc.

Loops

The for Loop

Iterates over a list of values:

for item in value1 value2 value3; do
    echo "$item"
done

Common sources for the list:

Filename expansion (globbing):

for file in *.txt; do
    echo "Processing $file"
done

Command substitution:

for i in $(seq 1 10); do
    echo "Number $i"
done

The while Loop

Executes while a command succeeds:

counter=1
while [[ $counter -le 5 ]]; do
    echo "Iteration $counter"
    counter=$((counter + 1))
done

Reading files line-by-line:

while read -r line; do
    echo "Line: $line"
done < file.txt

The read command returns 0 while reading lines successfully, and non-zero at end-of-file.

Exercise E: File Descriptor Redirection

This exercise explores redirection, connecting to Lab 3's file descriptor concepts.

Steps:

  1. Redirect stdout to a file with >, then display the file contents.
  2. Redirect stderr to a file with 2> by trying to list a nonexistent directory.
  3. Redirect both stdout and stderr to the same file using &>.
  4. Demonstrate the 2>&1 pattern with explicit redirection order.
  5. Use a pipe to chain cat /etc/passwd, grep "root", and cut to extract the username.
  6. Create /tmp/lab5_redirect.sh that demonstrates:
    • Output redirection with > and >>
    • Stderr capture with 2>
    • Combined stream redirection with &>
    • A pipeline example
    • Input redirection with <
  7. Make the script executable and run it.
  8. Clean up all test files.

Deliverable E: Provide the captured stderr contents and complete script output.

Command Substitution

Command substitution captures a command's stdout for use elsewhere:

current_date=$(date +%Y-%m-%d)
echo "Today is $current_date"

Bash executes the command in $(), captures its output, and substitutes it in place.

Common in loops:

for i in $(seq 1 10); do
    echo "$i"
done

The seq output (numbers 1-10, one per line) is captured, word-split, and becomes the loop's value list.

Functions

Functions group commands for reuse:

greet() {
    echo "Hello, $1"
}

greet "Alice"    # Outputs: Hello, Alice

Arguments: Access via $1, $2, etc. $@ expands to all arguments; $# gives the count.

Return values: Functions return exit status (0-255) via return. For other values, write to stdout and capture with command substitution:

get_uppercase() {
    echo "$1" | tr '[:lower:]' '[:upper:]'
}

result=$(get_uppercase "hello")

Variable Scope

By default, variables are global. Use local for function-local variables:

calculate() {
    local temp=$1
    local result=$((temp * 2))
    echo "$result"
}

Without local, these variables would affect the global scope.

Arrays

Arrays are collections of strings:

files=("file1.txt" "file 2.txt" "file3.txt")

Access elements:

echo "${files[0]}"           # First element
echo "${files[@]}"           # All elements
echo "${#files[@]}"          # Number of elements

Critical syntax: Use "${array[@]}" (with quotes) to preserve each element as a separate string:

for file in "${files[@]}"; do
    echo "Processing: $file"
done

This correctly processes three files, including "file 2.txt" as one item.

Without quotes:

for file in ${files[@]}; do    # WRONG
    echo "Processing: $file"
done

Word splitting breaks "file 2.txt" into "file" and "2.txt", processing four items instead of three.

Defensive Scripting: set -euo pipefail

Bash scripts, by default, continue after errors. This can cause silent failures that are hard to debug. These options make scripts safer:

set -e (Exit on Error)

Exit immediately if any command returns non-zero:

set -e

cp important.txt backup.txt
process backup.txt    # Won't run if cp failed

To allow specific failures:

rm -f /tmp/file.txt || true

set -u (Error on Undefined Variables)

Treat undefined variables as errors:

set -u

echo "$undefined_var"    # Error: undefined_var: unbound variable

This catches typos and logic errors.

set -o pipefail (Pipeline Failure)

Normally, a pipeline's exit status is the last command's status:

grep "ERROR" missing.log | head -n 10
# Returns 0 (head succeeded) even though grep failed

With pipefail, the pipeline fails if any command fails:

set -o pipefail

grep "ERROR" missing.log | head -n 10
# Returns non-zero (grep failed)

Recommended Practice

Start every script with:

#!/bin/bash
set -euo pipefail

These options prevent silent errors and make debugging easier.

Scripting Challenges

Challenge 1: Log File Analyzer

Write /tmp/lab5_log_analyzer.sh that analyzes a log file.

Requirements:

  • Accept one argument: log file path
  • Print usage message if wrong number of arguments
  • Print error if file doesn't exist or isn't readable
  • Count lines containing "ERROR", "WARN", and "INFO"
  • Print summary report
  • Use: shebang, set -euo pipefail, proper quoting, if statements, grep, counting method

Test data:

cat > /tmp/lab5_test.log <<'EOF'
2024-01-15 10:00:00 INFO Application started
2024-01-15 10:05:23 INFO User login successful
2024-01-15 10:12:45 WARN Connection timeout, retrying
2024-01-15 10:15:00 ERROR Database connection failed
2024-01-15 10:15:30 INFO Connection restored
2024-01-15 10:20:00 ERROR Invalid configuration parameter
2024-01-15 10:25:00 WARN Disk space low
2024-01-15 10:30:00 INFO Backup completed successfully
EOF

Expected output:

Log Analysis Report for /tmp/lab5_test.log
==========================================
ERROR: 2
WARN:  2
INFO:  4

Test:

chmod +x /tmp/lab5_log_analyzer.sh
/tmp/lab5_log_analyzer.sh /tmp/lab5_test.log
/tmp/lab5_log_analyzer.sh
/tmp/lab5_log_analyzer.sh /nonexistent.log

Deliverable Challenge 1:

  • Complete script with comments
  • Output from test log
  • Output with no arguments (usage)
  • Output with nonexistent file (error)

Challenge 2: Batch File Processor

Write lab5_batch_processor.sh that processes multiple files.

Requirements:

  • Accept two arguments: directory path and operation ("count", "uppercase", or "list")
  • Print usage if wrong arguments or invalid operation
  • Define three functions (one per operation):
    • count: Count lines in each .txt file
    • uppercase: Create .upper.txt files with uppercase content
    • list: List .txt files with sizes
  • Store files in array, use loop to process
  • Must use: shebang, set -euo pipefail, functions with local, array with "${array[@]}", loop, case/if for operation selection

Test data:

mkdir -p /tmp/lab5_batch_test
echo -e "line 1\nline 2\nline 3" > /tmp/lab5_batch_test/file1.txt
echo -e "hello world\ntest file" > /tmp/lab5_batch_test/file2.txt
echo -e "single line" > /tmp/lab5_batch_test/file3.txt

Expected output for "count":

Processing files in /tmp/lab5_batch_test
Operation: count
========================================
file1.txt: 3 lines
file2.txt: 2 lines
file3.txt: 1 lines

Test:

chmod +x lab5_batch_processor.sh
./lab5_batch_processor.sh /tmp/lab5_batch_test count
./lab5_batch_processor.sh /tmp/lab5_batch_test uppercase
cat /tmp/lab5_batch_test/file1.upper.txt
./lab5_batch_processor.sh /tmp/lab5_batch_test list

Deliverable Challenge 2:

  • Complete script
  • Output from "count" operation
  • Output from "uppercase" operation + contents of one .upper.txt file
  • Output from "list" operation

Reference: Common Patterns

Quick reference for common bash patterns:

Script header:

#!/bin/bash
set -euo pipefail

Check arguments:

if [[ $# -lt 1 ]]; then
    echo "Usage: $0 <arg>" >&2
    exit 1
fi

Test file types:

if [[ -f "$file" ]]; then
    echo "Regular file"
elif [[ -d "$file" ]]; then
    echo "Directory"
fi

Read file line-by-line:

while read -r line; do
    echo "$line"
done < "$filename"

Iterate over files:

for file in *.txt; do
    [[ -f "$file" ]] || continue
    echo "$file"
done

Command substitution:

date=$(date +%Y-%m-%d)
count=$(wc -l < "$file")

Arrays:

arr=("item1" "item 2" "item3")
for item in "${arr[@]}"; do
    echo "$item"
done

Functions:

process() {
    local file="$1"
    echo "Processing $file"
}

Here's a table of common commands for file usage and text manipulation:

Common Commands Reference

File Operations

Command Description Example
ls List directory contents ls -la /tmp
cd Change directory cd /home/user
pwd Print working directory pwd
cp Copy files/directories cp file.txt backup.txt
mv Move/rename files mv old.txt new.txt
rm Remove files rm file.txt
mkdir Create directory mkdir -p /path/to/dir
rmdir Remove empty directory rmdir olddir
touch Create empty file or update timestamp touch newfile.txt
ln Create links ln -s target link
find Search for files find /home -name "*.txt"
chmod Change file permissions chmod +x script.sh
chown Change file owner chown user:group file.txt

Text Viewing and Editing

Command Description Example
cat Concatenate and display files cat file.txt
less View file with pagination less largefile.txt
more View file page by page more file.txt
head Display first lines of file head -n 10 file.txt
tail Display last lines of file tail -n 20 file.txt
nano Simple text editor nano file.txt
vim Advanced text editor vim file.txt

Text Manipulation

Command Description Example
grep Search for patterns in text grep "error" log.txt
sed Stream editor for text transformation sed 's/old/new/g' file.txt
awk Pattern scanning and processing awk '{print $1}' file.txt
cut Extract sections from lines cut -d: -f1 /etc/passwd
sort Sort lines of text sort file.txt
uniq Remove duplicate lines uniq sorted.txt
tr Translate or delete characters tr '[:lower:]' '[:upper:]'
wc Count lines, words, characters wc -l file.txt
diff Compare files line by line diff file1.txt file2.txt
paste Merge lines of files paste file1.txt file2.txt
join Join lines based on common field join file1.txt file2.txt

Text Processing Utilities

Command Description Example
echo Display text echo "Hello World"
printf Formatted output printf "%s\n" "text"
xargs Build and execute commands from input xargs rm
tee Read from stdin, write to stdout and files tee file.txt
column Format output into columns column -t
expand Convert tabs to spaces expand file.txt
unexpand Convert spaces to tabs unexpand file.txt
fold Wrap lines to specified width fold -w 80 file.txt

File Information and Comparison

Command Description Example
file Determine file type file document.pdf
stat Display file status stat file.txt
du Disk usage du -sh /home/user
df Disk free space df -h
md5sum Calculate MD5 checksum md5sum file.txt
sha256sum Calculate SHA256 checksum sha256sum file.txt
cmp Compare two files byte by byte cmp file1 file2

Compression and Archives

Command Description Example
tar Archive files tar -czf archive.tar.gz dir/
gzip Compress files gzip file.txt
gunzip Decompress gzip files gunzip file.txt.gz
zip Create zip archives zip archive.zip file1 file2
unzip Extract zip archives unzip archive.zip
bzip2 Compress with bzip2 bzip2 file.txt
xz Compress with xz xz file.txt

Text Search and Pattern Matching

Command Description Example
grep Search using basic regex grep "pattern" file.txt
grep -E Extended regex (same as egrep) pat2" file.txt
grep -F Fixed strings (same as fgrep) grep -F "literal" file.txt
grep -r Recursive search grep -r "pattern" /path/
locate Find files by name (uses database) locate filename
which Show full path of commands which python
whereis Locate binary, source, and man pages whereis bash

Common Command Options

Frequently used flags across commands:

  • -r or -R: Recursive operation
  • -v: Verbose output or invert match (grep)
  • -f: Force operation
  • -i: Ignore case (grep, sort) or interactive (rm, mv)
  • -n: Show line numbers (grep, cat)
  • -a: All/append
  • -h: Human-readable output
  • -l: Long format or list files only

Deliverables and Assessment

Submit a single document (PDF or similar) containing screenshots of:

Exercise Deliverables:

  • Exercise A: Permission outputs, error message, successful output, explanation
  • Exercise B: type outputs, script + pwd output, explanation
  • Exercise C: Error/success outputs, script output, explanation
  • Exercise D: Exit status values, script output, explanation
  • Exercise E: Stderr contents, script output, FD explanations

Challenge Deliverables:

  • Challenge 1: Script code, test outputs (success, no args, nonexistent file)
  • Challenge 2: Script code, outputs for all three operations, .upper.txt contents

Additional Resources

This lab provides foundations for the next exercise on signals and inter-process communication. You'll use bash's trap builtin and named pipes without writing C code.

For further study: