this page last updated: Thu Jan 30 08:13:05 PST 2025
In this class, which attempts to provide a similar experience as a learning tool, one would none the less like to use Linux as a test of one's OS functionality. That is, while the assignments will be graded using tests that are not revealed, it would be helpful if one could use Linux as an exemplar to test one's submission before it is graded.
To do so, the grading will have to accept correct Linux semantics (at the currently installed version running on the CSIL machines) as being "correct" so that you can "know" that your OS is producing correct behavior. In general, such a litmus test is probably impossible. For example, it is possible to design tests that depend on the exact number of cycles that are assigned to a process as part of its time slice. Even then, the Linux scheduler (which is far more complex than the process scheduler you must write) could perturb the results.
However, if these timing-related tests are eschewed by the tests that are applied to your lab submissions, then there is another issue that we can address, but it requires an understanding of Linux character-level I/O.
Linux does support this property as long as one does not assume that it means that the file descriptor will behave in the same way each time it is used as a character stream. That is, the child can simply call read() and write() but the behavior of these calls with respect to delivering data will be different depending on how the file descriptor was initialized.
Let's take an example. Consider the following code for a program called read-write-80:
#include < stdio.h > main() { char ch[80]; int n; n = read(0, ch, 80); write(1, &ch[0], n); if (n < 0) { perror("cat"); } }If you compile and run this program from your login shell and then type the string
aaawhere you type carriage-return after the third a the program will echo the string "aaa\n" and exit to the shell.
apodictic:test rich$ ./read-write-80 aaa aaa apodictic:test rich$Notice that it is not possible to type in a second string because the newline character at after the end of the third 'a' caused the read to complete in the code and the write to then echo the string.
However, now create a file (say, called testfile) and in it put the fllowing three strings
aaa bbb ccc where a newline terminates each line. Now try running the program and using the shell to redirect the file into the program standard input device (file descriptor zero).
apodictic:test rich$ ./read-write-80 < testfile aaa bbb ccc apodictic:test rich$If the semantics were the same, you'd see the same output as before. That is, in the previous case where you are connected to the program via your terminal window, the newline character tells the read() call to terminate. However, when you redirect a file into standard in, then the newline characters do not trigger the call to read() to complete. Instead, read() waits to see the EOF before it sends its entire buffer (including newline characters) to write() in the program.
At this point, you have two legitimate questions to ask:
When you are interacting with Linux via the terminal, the login process interposes a terminal "driver" called a tty between your keyboard and the process that is reading your keystrokes via the read() system call. This terminal driver is automatically configured by the login process and it interprets "special" characters (like ctrl characters) before they are delivered to the process calling read().
When you log in, the typical process that is started for you is running a shell (/bin/bash on most modern Linux systems) and it is reading from the tty that the login process set up. If you type
stty -ayou will see the special character processing that the tty is interpreting. On csil.cs.ucsb.edu at the time of this writing, here is the output.
speed 9600 baud; rows 25; columns 108; line = 0; intr = ^C; quit = ^\; erase = ^?; kill = ^U; eof = ^D; eol =Consult the man page for some of the esoterica, but the important flags here are icanon and eof.; eol2 = ; swtch = ; start = ^Q; stop = ^S; susp = ^Z; rprnt = ^R; werase = ^W; lnext = ^V; discard = ^O; min = 1; time = 0; -parenb -parodd -cmspar cs8 -hupcl -cstopb cread -clocal -crtscts -ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr icrnl ixon -ixoff -iuclc ixany imaxbel iutf8 opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0 isig icanon iexten echo echoe -echok -echonl -noflsh -xcase -tostop -echoprt echoctl echoke -flusho -extproc
The icanon flag says that the terminal that login thinks you are using is a canonical terminal. When the terminal is a canonical terminal, then the various ctrl characters (including eof) are interpreted by the tty.
The eof character is, sadly, a bit of a misnomer. It really should be eot which stands for "end of transmission." Back in the days when terminals really were terminals (and not ssh connections or window manager constructs) an eot character would tell the tty that there is are no more keystrokes "in the pipeline" between the keyboard and the process.
Today, the (good?) people of the Linux community have adopted the convention that eot is eof so what this stty output says is that the tty will interpret ctrl-D as "end of transmission."
End of transmission means two different things, however, depending on whether the key stroke immediately before it is a "newline" character (generated by hitting the enter key) or not. The newline character is a line delimeter (set by bash -- see man bind) that tells the shell to "accept the line" (accept-line function in a bind -p output). When you type
some characters ctrl-Dthe ctrl-D on a line by itself causes the tty to "close" signalling EOF. That is, when an "end of transmission" happens all by itself on a line, Linux interprets that event to mean that the input is closed and there will no more input -- ever. That is, no more transmissions from the terminal will be expected (or accepted) of an eof character is sent by itself from a terminal (keyboard).
However.
If a ctrl-D does not appear on a line by itself, then it means "there are no more characters coming right now, but there may be more characters in the future." The effect is that the Linux tty driver will "flush" the current line, but (importantly) not signal EOF.
Note, also, that the ctrl-D is not delivered in either case. In the EOF case, the tty returns 0 on a read (and the ctrl-D is absorbed by the tty driver and not delivered). In the "line flush" case, the read simply completes but not ctrl-D is delivered.
As to the second question, you have a bit of a problem because kos does not include a tty driver. As a result, you have a choice. You can either implement the same semantics as Linux does when the input is a tty. In this case, if you were to cross-compile read-write-80 for kos and run it as
./kos -a 'test_execs/read-write-80'and then to type
aaayour OS would echo 'aaa\n' and then exit (as Linux does when you type the characters 'aaa\n' on the keyboard with the program running in the foreground). However if you were to then run kos and ask the shell to redirect the input to standard in as
./kos -a 'test_execs/read-write-80' < testfileyou get the same output as you did when you typed 'aaa\n' from the keyboard. That is, the string 'aaa\n' is echoed and the program exits (causing kos to halt) which is different than the Linux case where you used a shell redirect to send three strings (seaparted by newlines) to the program and all three were echoed properly.
Alternatively, if you use the non-tty semantics for standard in, then the redirect works properly, but running
./kos -a 'test_execs/read-write-80'will allow you to input characters from the keyboard but will hang until you type "^D" which, is different than the tty semantics of Linux for the same program.
And then there are pipes.
In one of the labs you will be asked to implement Linux pipes. These too can be set up by a program as the standard in and standard out file descriptors for a child that will then simply call read() and write(). However, Linux pipes have slight different semantics as well. In particular, when reading a pipe, EOF indicates that the last writer of the pipe has closed and that no more data will be available. However pipes are intended to allow processes to communicate freely while they are open. Thus, as the implementer of a pipe, you are faced with the following design decision:
If a process has written some data into a pipe, and a reader has called read() with a buffer size that is larger than the data in the pipe, when do does read() return to user space?
If the writer and reader processes are not written so that they are coordinated, it is not possible for the reader to "know" how much buffer to use in a read() call to ensure that the writer fills it completely. Put another way, if the reader knew the writer was going to write 10 bytes at a time, the reader could always call read(pd,buf,10) and the OS could just wait for the buffer to become full each time before returning to user space.
However if the write just writes some data in an amount the reader cannot anticipate, then the reader must be able to have data delivered before the bufer is full (i.e. a short read) or the reader can only read a character at a time because the OS will only return to user space when the buffer is full.
Furthermore, the newline character can't be used in a pipe to trigger the call to read() to return to user space. Pipes are intended to tranfers both ASCII and binary data. If binary data is being transferred, then the byte corresponding to a newline character might be a legitimate element of the data stream. If the read() completes in this case, it is completing not because a line has ended but because some random byte matches the end-of-line character.
The solution for this dilemma is for the reader to implement the following logic. If the read() call begins reading data from the the pipe and filling the user space buffer and then discovers that there is no more data to read (but the pipe is still open has not been closed), the last of the data that is present in the pipe is delivered to user space and the read() call terminates with a "short read."
Notice that if you only implement these semantics for your implementation of read() and you run the program
./kos -a 'test_execs/read-write-80'and you try to type in
aaathe code prints a single 'a' character (and no new line) and then exits causing kos to halt. That is, it doesn't even print all three 'a' characters. The output is identical when you run
./kos -a 'test_execs/read-write-80' < testfileOnly a single 'a' (and no newline) is echoed from the program before the OS halts.
Why?
Because when characters come in from either the tty or a file they come in slowly, one at a time. The pipe logic sees the first 'a' but no other 'a' characters are waiting (since an interrupt must happen to annouce the second 'a' and it will take a long time). Thus the read() call notices that there are no other characters present and it returns to user space causing the call to complete and the write() call to print only the single character that was delivered.
The second problem you must solve is to differentiate between running kos and typing input from the keyboard and running kos using the shell to redirect input from a file. Here you need a way to tell the invocation of kos whether it should treat the input as a tty or a file.
To enable this latter functionality, kos includes a '-t' flag. Running kos wit this flag does two things
To use this feature, you would run
./kos -t -a 'test_execs/read-write-80'when entering data from the keyboard, but
./kos -a 'test_execs/read-write-80' < testfile(leaving off the -t) when using the shell to redirect a file into the OS.
Note that this flag doesn't solve the problem for you. Instead, it gives you a way to determine (the way Linux does) whether you should treat the input as a tty or as a file. Your code will need to query the IsTTY variable and to detect and filter the end-of-line character (-2) in its implementation of read() to implement tty semnatics (when IsTTY is 1) and to ignore end-of-line entirely (and treat -2 as a normal character) when IsTTY is zero.
For example, if you run
./kos -a 'test_execs/ksh'and then create a pipe between cat and read-write-80
ksh: test_execs/cat | test_execs/read-write-80 aaa bbb ccc ctrl-Dyou may only see part of the output. For example, in my OS the program prints
aaaand then finishes. Why? Because my scehduler allows cat to write 3 characters before read-write-80 is allowed to get the CPU. When it does it gets the three characters and, finding so others in the pipe, the read() call made in read-write-80 completes.
Similarly, when I run
kos -t -a 'test_execs/ksh'and run the same test, my OS only prints the first 'a' before read-write-80 exits. The reason here is that in the TTY case, cat gets a character at a time and, because it has to wait for the console interrupt, read-write-80 gets the CPU much sooner. It only find a singel character and exits.
The moral here (if there is one) is that you can use Linux as a guide, but you'll need to understand (as always) what it is your OS is doing to understand whether it is doing the same things as Linux does. In this case, I'd need to emulate Linux CPU scheduling to get a precise replication of the Linux output from this test.