this page last updated: Tue 14 Sep 2021 09:02:33 AM PDT
Thus, you must test your OS extensively before it ships. While impossible to achieve, your goal is to exercise your OS so much more rigorously than any "normal" user would so that the chances of a bug are confined to "corner cases" or "anomalous operational circumstances."
Please read the previous paragraph again before we move on to discuss autograding and Gradescope.
In this class, to make the lives of the TAs barely tolerable, we will use a web-based technology called Gradescope to collect and grade your programming assignments. Gradescope includes an autograding feature that will build your code and apply a fixed set of tests to it. It then compares the textual output of these tests to the textual output of a known-correct solution and displays this difference. Unfortunately, when they differ, gradescope colors the test red and considers the test failed and when they are the same, it colors the test green and considers the test "passed."
While this circumstance sounds reasonable enough, it turns out to be the source of tremendous frustration, particularly if you do not understand the previous paragraphs.
After the deadline for each assignment, Gradescope will automatically stop accepting submissions (it will retain the last solution you submitted before the deadline). It will then switch to use a different set of tests and your grade will be determined by these final acceptance tests.
All submission tests are available in /cs/faculty/rich/cs170/test_execs.
The final acceptance tests will not be made public.
Thus the first source of frustration is illustrated by the following hypothetical scenario. CS170 Team A submits (usually repeatedly) potential solutions to a lab assignment until Gradescope reports all sanity checks as green. Team A then considers its solution both "done" and "correct" and it ceases to test and enhance its solution. The due date passes, Gradescope runs the acceptance tests, and the solution fails one or more of these tests. Team A knows that the next lab assignment depends on the correct function of this lab assignment and, so, wants access to the final acceptance tests to use them to debug their solution but the acceptance tests are not public.
For example, once you see the correct output, you could simply write a program that prints the correct output and submit that in place of your OS. Gradescope will run that program, compare its output to the correct output, and if your prints statement are correct, record that your solution passes the test.
Less degenerately, most students can "code to the test" meaning that once the test is completely understood, they can write a code that implements the test correct but does not necessarily implement a fully correct solution.
In this class, you will need to write your own tests, and to use them on your submissions that are extensive enough to "cover" the things that the unseen final acceptance tests cover. We will not test "tricky" corner cases or features that are undocumented in the Linux documentation. However, you will NOT be able to rely on the sanity checks performed by Gradescope to determine whether your solution is correct and complete.
That sounds deceptively simple, and it is, but perhaps not as simple as one might expect.
The first thing to realize is that every test you write for your OS (and every test we supply), you can compile and run on Linux itself. The lab preparation instructions and the KOS lecture notes describe how to build tests for your labs using an (elaborate) cross-compiler. If you simply build the same tests using the version of gcc that is installed on csil.cs.ucsb.edu you can run the test on csil.cs.ucsb.edu and compare the output to the output from your lab.
However.
There are a few caveats. The first is that Linux behaves differently when you type things in from the keyboard compared to when it reads input from a file. This page describes the issues in detail and also a way (if you are really serious about implementing Linux functionality) to make your solution perfectly congruent.
In this class, however, the limitations of Gradescope prevent us from testing Linux keyboard input. Thus you should test your codes using a shell redirect to give them input. Put another way, the TAs will not be launching your OS and typing inputs to it via the keyboard. Because Linux behaves differently (sometimes) when you use the keyboard for input, it is important that you test using file input as well.
Imagine that you have written a test code that looks like
#include < stdlib.h > #include < stdio.h > main() { int pid; int my_pid; int i; pid = fork(); if(pid < 0) { exit(1); } my_pid = getpid(); for(i=0; i < 10; i++) { printf("pid: %d, i: %d\n",my_pid,i); } exit(0); }Don't worry if, before Lab2, you don't understand this code exactly. What it does is to create two processes (via the fork() system call and each one prints out its process id and a counter.
Here is the Linux output when I ran it early one morning before the quarter began
pid: 9533, i: 0 pid: 9533, i: 1 pid: 9533, i: 2 pid: 9533, i: 3 pid: 9533, i: 4 pid: 9533, i: 5 pid: 9533, i: 6 pid: 9533, i: 7 pid: 9533, i: 8 pid: 9533, i: 9 pid: 9534, i: 0 pid: 9534, i: 1 pid: 9534, i: 2 pid: 9534, i: 3 pid: 9534, i: 4 pid: 9534, i: 5 pid: 9534, i: 6 pid: 9534, i: 7 pid: 9534, i: 8 pid: 9534, i: 9And here is the output from a working lab solution
pid: 1, i: 0 pid: 2, i: 0 pid: 1, i: 1 pid: 2, i: 1 pid: 1, i: 2 pid: 2, i: 2 pid: 1, i: 3 pid: 2, i: 3 pid: 1, i: 4 pid: 2, i: 4 pid: 1, i: 5 pid: 2, i: 5 pid: 1, i: 6 pid: 2, i: 6 pid: 1, i: 7 pid: 2, i: 7 pid: 1, i: 8 pid: 2, i: 8 pid: 1, i: 9 pid: 2, i: 9If the lab solution is working, why are the outputs different?
There are two answers. The first has to do with asynchrony. Our lab solutions will use a different scheduler from the one that Linux uses to decide what to do "next" then the code does not define an explicit order. The C program does not define what order the two processes execute so the OS is free to choose any legal order. In this example, Linux chose to run one process (pid: 9533) until it finished and then to switch to process the other process, but the lab solution interleaved them. Both orderings are correct.
How can you tell, then, if your answer is correct? To do so, you will need to understand exactly what the test is doing and what the OS system call implements. In this case, you need to understand the following:
When you have generated enough tests that are consistent with Linux, you are reaonably sure that your solution is correct.
Oridinarily, the specific variable addresses that the compiler chooses are of little interest. However, in this class, it is often useful to print out and examine variable layouts, particularly on the stack. Worse, the "cook books" that many students choose to use to complete the lab assignments rely on an understanding of these variable layouts and the compiler's control over them.
As an example, consider the program argtest which prints the addresses of argc and argv on the stack.
#include < stdio.h > main(int argc, char **argv, char **envp) { int i; char buf[256]; sprintf(buf, "&argc is -->%u<--\n", &argc); write(1, buf, strlen(buf)); sprintf(buf, "argc is -->%u<--\n", argc); write(1, buf, strlen(buf)); sprintf(buf, "argv is -->%u<--\n", argv); write(1, buf, strlen(buf)); sprintf(buf, "envp is -->%u<--\n", envp); write(1, buf, strlen(buf)); for (i=0; iHere is the output from a correct solution%s<--\n", i, argv[i], argv[i]); write(1, buf, strlen(buf)); } if (envp != NULL) { sprintf(buf, "envp[0] is -->%s<--\n", envp[0]); write(1, buf, strlen(buf)); } exit(&argc); }
&argc is -->1048472<-- argc is -->4<-- argv is -->1048520<-- envp is -->0<-- argv[0] is (1048556) -->argtest<-- argv[1] is (1048551) -->Rex,<-- argv[2] is (1048548) -->my<-- argv[3] is (1048543) -->man!<--and here is the output shown in one of the cook books
&argc is -->1048472<-- argc is -->4<-- argv is -->1048520<-- envp is -->0<-- argv[0] is (1048540) -->argtest<-- argv[1] is (1048548) -->Rex,<-- argv[2] is (1048553) -->my<-- argv[3] is (1048556) -->man!<--Are they both correct?
Turns out that they are. The C compiler requires that the address of argc and the address of argv[0] be in certain locations on the stack, and that argv be an array that contains argc+1 elements (the last element being NULL). However, once it locates the address of argc and the address of argv[0] it doesn't much care where the values of argc and argv[0] are stored.
Worse, the order in memory where the strings are stored also doesn't matter. Each string must be stored contiguously and must be NULL terminated, and the first string ("argtest" in this example) must have its address in argv[0], the second string ("Rex,") must have its address in argv[1], and so on, but the strings themselves can be anywhere.
The first solution and second solution, then, simply differ in the order that the solutions chose to list the strings. Stack in Linux grow from high addresses down to low addresses. In solution 1, the first argument (the string pointed to by argv[0]) starts at a higher address (1048556 in the example) than the second argument (1048551 in the example), on so on, with each success argument starting at a lower address than the one before it. If you count the characters and the NULLs you'll see that they are all next to each other in memory. Further, the address of argv[0] (listed as argv in the output) is in a lower address than the lowest address show in any string. Thus you can deduce that the strings are on the stack at higher addresses than argv and that the arguments are placed on the stack with the in reverse order relative to the addresses.
In the second example from the cook book, argc and argv are in the same place, but the solution puts the first argument at a lower address than the second argument, and so on. Thus in the second solution, the lower arguments occupy lower addresses on the stack (not higher ones as in the first solution).