CS170: Project 1 - Shell (20% of project score)

Project Goals

The goals of this project are:

to understand important Unix system calls
to develop a simple shell application

Administrative Information

The project is an individual project. It is due on Thursday, April 18, 2019, 23:59:59 PST (no deadline extensions or late turn ins).

Implement a basic shell

The goal of this project is to implement a basic shell. A shell is a command line interpreter that accepts input from the user and executes the commands that are given. Similar to well-known shells such as bash or tcsh, your shell must be able to execute commands, redirect the standard input or standard output of commands to files, pipe the output of commands to other commands, and put commands in the background.

When the shell is ready to accept commands, it must print the prompt "shell: " (without the quotes). At this point, the user can type commands. Commands are alphanumeric tokens (e.g., ls, ps, cat) that represent programs that should be executed. You should search for these programs in the directories determined by the PATH environment variable that is passed to your shell. Of course, commands can have arguments. Thus, tokens that follow a command (separated by white space) are treated as the arguments to this command (e.g., cat x indicates that the shell should invoke the cat program and pass x as its argument. When the shell has received a line of input, it typically waits until all commands have finished. Only then, a new prompt is displayed (however, this behavior can be altered -- see below for details).

In addition to commands and their arguments, your shell must understand the following meta-characters: '<', '>', '|', and '&'.

The meta-character '<' must be followed by a token that represents a file name. It indicates that the command before this meta-character reads its input from this file (instead of stdin of the shell). Thus, the meta-character '<' must follow the name of a command. Note that there can be spaces between the command name and '<', and between '<' and the file name. However, this does not have to be the case. That is, both cat < file and cat<file are valid and have the same effect. Note that at most one meta-character '<' (i.e., one input redirection) can be given for a single command. That is, more than one of the meta-character '<' for a single command is an error.

The meta-character '>' must be followed by a token that represents a file name. It indicates that the command before this meta-character writes its output to this file (instead of stdout of the shell). Thus, the meta-character '>' must follow the name of a command. Again, there can be spaces between the command name and '>', and between '>' and the file name. However, this does not have to be the case. That is, both ls > file and ls>file are valid and have the same effect. Note that at most one meta-character '>' (i.e., one output redirection) can be given for a single command. That is, more than one of the meta-character '>' for a single command is an error.

The meta-character '|' (i.e., pipe sign) allows multiple commands to be connected. When the shell encounters the '|' character, the output of the command before the pipe sign must be connected to the input of the command after the pipe sign. This requires that there is a valid command before and after the pipe. Also, note that there can be multiple pipe signs on the command line. For example, your shell has to be able to process an input such as cat f | sort | wc. With this command, the output of the cat command is redirected to the input of sort, which in turn sends its output to the input of the wc program. With regard to white spaces separating the meta-character from the commands, the same rules as above apply.

The ampersand character '&' indicates that the command (or commands) of the shell input should be executed in the background and the shell immediately displays a prompt to wait for the next line (even though the commands on the previous line might not have exited yet). The '&' token may only appear as the last token of a line.

To simplify things, we only allow one '&' character to appear, and it has to be last on a command line. Also, only the first command on the input line can have its input redirected, and only the last can have its output redirected. Observe, however, that in case of a single command, we can apply both input and output redirection to this command (i.e., cat < x > y is valid, while cat f | cat < g is not).

In case of errors (e.g., invalid input, command not found, ...) your shell should display an error and wait for the next input. It should never simply die. Whenever you output an error (syntax, parsing, command not found, ...), you must prefix your error message with "ERROR: " (without the quotes), and this message cannot exceed a single line.

To facilitate automated grading, when you start your simple shell program with the argument '-n', then your shell must not output any command prompt (no "shell: "). Just read commands as usual.

To exit the shell, the user must type Ctrl-D (pressing the D button while holding control). This signals the end of input (EOF) to functions that wait for and read the user input.

You may assume that the maximum length of individual tokens never exceeds 32 characters, and that the maximum length of an input line never exceeds 512 characters.

Your shell is supposed to collect the exit codes of all processes that it spawns. That is, you are not allowed to leave zombie processes around of commands that you start.

Your shell should use the fork(2) system call and the execvp(2) system call (or one of its variants) to execute commands. It should also use waitpid(2) or wait(2) to wait for a program to complete execution (unless the program is in the background). You might also find the documentation for signals (and in particular SIGCHLD) useful to be able to collect the status of processes that exit when running in the background.

Your shell must be written in C/C++ and run under Linux. More specifically, it must compile without any warning/errors and run on any CSIL machine.

Some hints:

A simple shell such as this needs a command-line parser to figure out what the user is trying to do. To read a line from the user, you may use fgets(3).
If a valid command has been entered, the shell should fork to create a new (child) process, and the child process should exec the command.
Before calling exec to begin execution, the child process may have to close stdin (file descriptor 0) or stdout (file descriptor 1), open the corresponding file or pipe (with open for files, and pipe for pipes), and use dup2(2) or dup to make it the appropriate file descriptor. After calling dup2, close the old file descriptor.
The main challenge of calling execvp is to build the argument list correctly. If you use execvp, remember that the first argument in the array is the name of the command itself, and the last argument must be a null pointer.
The easiest way to redirect input and output is to follow these steps in order: (a) open (or create) the input or output file (or pipe). (b) close the corresponding standard file descriptor (stdin or stdout). (c) use dup2 to make file descriptor 0 or 1 correspond to your newly opened file. (d) close the newly opened file (without closing the standard file descriptor).
When executing a command line that requires a pipe, the pipe must be created before forking the child processes. Also, if there are multiple pipes, the command(s) in the middle may have both input and output redirected to pipes. Finally, be sure the pipe is closed in the parent process, so that termination of the process writing to the pipe will automatically close the pipe and send an EOF (end of file) to the process reading the pipe.
Any pipe or file opened in the parent process may be closed as soon as the child is forked -- this will not affect the open file descriptor in the child.

Deliverables

Please follow the instructions below exactly!

We will use gradescope to manage your project submissions and to communicate the results back to you. As a first step, you have to sign up for gradescope and join the CS170 class. For this, you will need an entry code. This entry code has been posted as an announcement on Piazza. If you have not done so already, this would also be a great time to sign up for Piazza.
You will submit all files that are part of your project via the gradescope web interface. All files that you need to build your shell (sources, headers, makefile) must be included as part of your submission. But please do not include any object or executable files.
The name of the executable shell that we will call and test must be simple_shell, and the shell program must be written in C/C++. Note that we do ask for a makefile. That is, we will copy all your files into a directory, call make and expect that the executable simple_shell is built from your sources.
Do not forget that you must support the '-n' argument to suppress the output of the shell prompt for automated testing.
Gradescope does support built-in autograding, but, currently, we do not intend to use it. Instead, we will test your projects in our own environment. So, do not worry if you don't get immediate feedback or if the system tells you that the autograder is not running.
Your project must compile on a CSIL machine. If you worked on a Windows machine or your laptop at home, then make sure it still works on CSIL or modify it appropriately!
Include a short README with this project. Explain what you did in the README. If you had problems, tell us why and what.

Created by Christopher Kruegel (© 2008, using Apache Cocoon).