The Missing Semester of CS Education, MIT - the Shell

Posted on 2023-07-02 繁/简： set

“计算机教育缺失的一课”，MIT 别出心裁开设的这门课将会介绍非常重要但是却鲜少在大学 CS curriculum 中 cover 的知识与工具，例如 Shell script，Vim，命令行环境，Git，ssh 等等。course motivation 中这么说 (~~有点汗颜，这说的不就是我吗~~)：

... Yet many of us utilize only a small fraction of those tools; we only know enough magical incantations by rote to get by, and blindly copy-paste commands from the Internet when we get stuck.

硬要说的话，ENGG1340 中介绍了一部分相关的内容，但是远远不够。所以学习一下这门课还是很有必要的；并且，这门课 workload 不大，当作暑校期间的小零食也不错。

This article is a self-administered course note.

It will NOT cover any exam or assignment related content.

What is the Shell?

Graphical use interfaces (GUIs).
Voice interfaces.
AR/VR.

Great for 80% of use-cases, but they are often restricted in what they allow you to do.

To take full advantage of the tools your computer provides, we have to go old-school and drop down to a textual interface: Ths Shell. In this lecture, we will focus on Bourne Again Shell ("bash").

1
2
3

missing:~$ echo hello
hello
missing:~$

terminal.
prompt.

The main textual interface to the shell. It tells you are on the machine missing and your current working directory is ~ (short for "home"). The $ tells you that you are NOT the root user.
command.

Execute the program echo with the argument hello. The shell parses the command by splitting it by whitespace, and then runs the program indicated by the first word, supplying each subsequent word as an argument that the program can access.

How does the shell know where to find the built-in programs like date and echo?

The shell is asked to execute a command.
If the command doesn't match one of its programming keywords, it consults an environment variable called $PATH that lists which directories the shell should search for programs when it is given a command.

missing:~$ echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
missing:~$ which echo
/bin/echo

Navigating in the Shell

The path / is the root of the file system.

A path that starts with / is called an absolute path.
Any other path is a relative path relative to the current working directory.

Command cd could take both absolute path and relative path as arguments.

/: root directory.
~: home directory.
.: current directory.
..: parent directory.

Connecting Programs

In the shell, programs have two primary "streams" associated with them: their input stream and output stream. When the program tries to read input, it reads from the input stream, and when it prints something, it prints to its output stream.

Normally, a program's standard input and output are both your terminal. That is, your keyboard as input and your screen as output. However, we can also rewire those stream.

command < file, command > file: rewire the input/output streams of a program to file.
command >> file: rewire the output streams of a program to append to a file.
command1 | command2: the use of pipes lets you "chain" programs such that the output of one is the input of another.

Root User

On most Unix-like systems, one use is special: the "root" user. It is above (almost) all access restrictions, and can create, read, update, and delete any file in the system.

You will not usually log into your system as the root use though, since it's too easy to accidentally break something. Instead, you will be using the sudo command. As its name implies, it lets you "do" something as "su" (short for "super user", or "root").

One thing you need to be root in order to do is writing to the sysfs file system mounted under /sys. sysfs exposes a number of kernel parameters as files, so that you can easily reconfigure the kernel on the fly without speacialized tools.

For example, by writing a value into the file in directory /sys/class/backlight, we can change the screen brightness. The first instinct might be to do something like:

$ sudo find -L /sys/class/backlight -maxdepth 2 -name '*brightness*'
/sys/class/backlight/thinkpad_screen/brightness
$ cd /sys/class/backlight/thinkpad_screen
$ sudo echo 3 > brightness
An error occurred while redirecting file 'brightness'
open: Permission denied

This error may come as a surprise. After all, we ran the command with sudo!

This is an important thing to know about the shell. Operation like |, >, and < are done by the shell, not by the indivisual program.

In the case above, the shell (which is authenticated just as your user) tries to open the brightness file for writing, before setting that as sudo echo's output, but is prevented from doing so since the shell does not run as root.

There are two solutions for this problem.

Use sudo su command to effectively get a shell as the super user. You will find that the $ in prompt changes to # (super user). In this mode, simply run echo 3 > brightness.
Run echo 3 | sudo tee brightness. Since the tee program is the one to open the /sys file for writing, and it is running as root, the permissions will work out.

tee program writes the output from STDIN into both STDOUT and a file.

Shell Scripting

So far we've seen how to execute commands in the shell and pipe them together. However, in many scenarios you will want to perform a series of commands and make use of control flow expressions like conditionals or loops.

Most shells have their own scripting language with variables, control flow and its own syntax. What makes shell scripting different from other scriptng programming language is that it is optimized for performing shell-related tasks.

Variables and Functions

foo=bar;
echo "$foo"
# prints bar
echo '$foo'
# prints $foo

Note that foo = bar will not work since it is interpreted as calling the foo program with arguments = and bar. In general, in shell scripts the space character will perform argument splitting.

# filename: mcd
mcd() {
    mkdir -p "$1"
    cd "$1"
}

Bash has functions that take arguments and can operate with them. The above is an example of a function that creates a directory and cd into it.

Here $1 is the first argument to the script/function. Bash uses a variety of special variables to refer to arguments, error codes, and other relevant variables.

$0 - Name of the script. (command)
$1 to $9 - Arguments to the script. $1 is the first argument and so on.
$@ - All the arguments.
$# - Number of arguments.
$? - Return code of the previous command.
$$ - Process identification code (PID) for the current script.
$_ - Last argument from the last command.

When a bash function is defined, you could loaded it into the shell environment with the command source, so you could later run it like any other built-in programs.

1
2
3

XXZ:~$ source mcd
XXZ:~$ mcd mcd_test
XXZ:~/mcd_test$

Return/Exit Code

Commands will often return output using STDOUT, errors through STDERR, and a Return Code to report errors in a more script-friendly manner. A value of 0 usually means everything went OK; anything different from 0 means an error occurred.

Exit codes can be used to conditionally execute commands using && (and operator) and || (or operator), both of which are short-circuiting operators. Commands can also be separated within the same line using a semicolon ;.

false || echo "Oops, fail"
# Oops, fail

true || echo "Will not be printed"
# 

true && "Things went well"
# Things went well

false && echo "Will not be printed"
#

false ; echo "This will always run"
# This will always run

Substitution

Variable substitution. Whenever you place "$var", it will expand the variable var and substitute it in place as a string.
Command substitution. Whenever you place $(CMD), it will execute CMD, get the output of the command and substitute it in place.
Process substitution. <(CMD) will execute CMD and place the output in a temporary file and substitute the <() with that file's name. This is useful when commands expect values to be passed by file instead of by STDIN.

#!/bin/bash

echo "Starting program at $(date)" # Date will be substituted
echo "Running program $0 with $# arguments with pid $$"

for file in "$@"; do
    grep foobar "$file" > /dev/null 2> /dev/null
    # when pattern is not found, grep has exit status 1
    # we redirect STDOUT and STDERR to a null register since we do not care about them
    if [[ $? -ne 0 ]]; then
        echo "File $file does not have any foobar, adding one"
        echo "# foobar" >> "$file"
    fi
done

The above exampke will iterate through the arguments we provide, grep for the string foobar. and append it to the file as a comment if it's not found.

Shell Globbing

When launching scripts with similar arguments, we could use shell globbing technique to expand expressions by carrying out filename expansion.

Wildcards. Use ? and * to match one or any amount of characters repectively.
Curly braces {}. Whenever you have a common substring in a series of commands, you can use curly braces for bash to expand this automatically.

convert image.{png, jpg}
# will expand to
convert image.png image.jpg

cp /path/project/{foo,bar,baz}.sh /newpath
# will expand to
cp /path/project/foo /path/project/bar /path/project/baz /newpath

mv *{.py,.sh} folder
# will move all *.py and *.sh files

mkdir foo bar
touch {foo,bar}/{a..h}
# This creates files foo/a, foo/b, ..., foo/h, bar/a, bar/b, ..., bar/h
touch foo/x foo/y
diff <(ls foo) <(ls bar)
# Outputs
# < x
# ---
# > y

Shebang

Scripts need not necessarily be written in bash to be called from the terminal. For instance, here's a simple Python script that outputs its arguments in reversed order.

#!/usr/local/bin/python
import sys
for arg in reversed(sys.argv[1:]):
    print(arg)

The shebang line at the top of the script tells the kernel to execute this script with a python intepreter instead of a shell command. It is good practice to write shebang lines using the env command that will resolve to wherever the command lives in the system, increasing the portability of the script.

To resolve the location, env will make use of the PATH environment variable. For this example the shebang line would look like #!usr/bin/env python.

Differences between shell functions and scripts:

Functions have to be in the same language as the shell, while scripts can be written in any language. This is why including a shebang for scripts is important.
Functions are executed in the current shell environment whereas scripts execute in their own process. Therefore functions can modify environment variables, e.g. change your current directory, whereas scripts can't.
Functions are loaded once when their definition is read. Scripts are loaded every time they are executed.

Shell Tools

Shell check.
Finding how to use commands.
- Command with -h or --help.
- Manual page command man.
- TLDR pages are a nifty solution that focuses on giving example use cases.
Finding files using command file.
- recursively search for files matching some criteria.
- perform actions over files that match your query.
- A user-friendlier alternative fd.
Finding code.
- Command grep with flags -C, -v, -R.
- Alternative ack, ag and rg.
Finding shell commands.
- Command history will let you access your shell history. Ex. `history | grep find will print commands that contain the substring "find".
- General-purpose fuzzy finder fzf.
- history-based autosuggestions.
Directory navigation.
- fasd and autojump, ranks directories and files by frecency, that is, by both frequency and recency.
- directory structure: tree, broot.

More detailed introduction.

Reference

This article is a self-administered course note.

References in the article are from corresponding course materials if not specified.

Course info:

MIT Open Learning. The Missing Semester of Your CS Education.

Course resource:

The Missing Semester of Your CS Education.

-----------------------------------そして、次の曲が始まるのです。-----------------------------------