This is a demanding lab. Start early. Step 1 -- Compile your code from the previous lab with simulator_lab2.h and the new main_lab2.o (use http://www/cs.ucsb.edu/~rich/class/cs170/labs/kos_mp/Makefile). Run it and it will fail, saying that the program you are trying to run is too big. This is because the User_Base and User_Limit registers have not been set. Set them in InitUserProcess() to be zero and MemorySize respectively. Do this before the call to load_user_program() so that the machine knows where, in memory, to load the program. Recompile and rerun. It should work as before now. Step 2 -- Now, have your program run hw2 -- this is a simple hello world program that uses the standard I/O (stdio) library. Run it with the -d e option. What will happen first is that you'll get an unknown system call #54 (which is SYS_ioctl). This is the ioctl call made by the standard I/O library. This is kind of irritating, but you have to deal with it. The call from the stdio library that is made here is ioctl(1, JOS_TCGETP, termios), where termios is a (struct JOStermios *). That is, the third argument to the system call is a pointer into user space where the stdio library expects you (the kernel) to fill in a struct JOStermios data structure. You have to service this by calling ioctl_console_fill(x), where x is a kernel pointer to a user space address and then return zero from the system call. If the ioctl call has any other arguments (i.e. a different first or second argument) return EINVAL. Do this. I forked off a kthread to deal with the SYS_ioctl (like all other system calls) so that SysCallReturn() worked to return 0 to user space. What you'll get now is an unknown system call of type SYS_fstat. On to the next step. Step 3 -- The stdio library calls fstat() mainly to get the buffer size of an open file descriptor. For fd 0, this should be one. For fd's 1 and 2, this should be something larger -- I did 256. Anyway, you have to deal with this call and fill in the stat buf that is the second argument (see the man page for fstat for the arguments to the system call and the return value). You deal with this by calling stat_buf_fill(struct KOSstat *addr, int blk_size), where addr is the address of the stat buffer in user space (that the kernel can reference like for the ioctl call) and blk_size is the buffer size. Again, I forked off a kthread to handle the SYS_fstat system call in exception.c where the system calls are dispatched. Test it -- you'll get an unknown system call #64 (SYS_getpagesize) -- getpagesize(). Implement the getpagesize() system call in the same way in exception.c and have the thread you fork off return the contents of the variable PageSize. Again, consult the man page for getpagesize() to see what arguments it takes. Step 4 -- Now you'll get unknown system call #69 (SYS_sbrk) -- sbrk(). Read the man page for all the details of sbrk(), and implement it: The initial brk pointer should be the return value of load_user_code(), and you should never let the user get more space than you have allocated for them (i.e. with one program, the sbrk pointer should not be larger than MemorySize). The sbrk() value should be added to the PCB because the OS needs to know what it is for each process so it can return the right value. For kos, you only need to return the correct value on each call (you don't need to do ther other things that sbrk does according to the man page). Hack this up and get it working. hw2 should work fine now. Step 5 -- Now is a good time to put a little more memory management info into your PCB's. Specifically, put Base and Limit register values (as integers) in. These should be set in InitUserProcess(), and when you make memory checks (say in the implementation of the sbrk() call), you should make them against these. In my code, this required changes in my implementations of sbrk(), ioctl(), fstat(), read() and write(). I wrote a routine called ValidAddress() which checks an address against the base and limits in the context of the current process (e.g. the current PCB) and returns either TRUE or FALSE. Next, make sure that the User_Base and User_Limit registers are set properly before calling run_user_code() in the scheduler. That is, (for the case coming up where there are multiple processes), you set User_Base and User_Limit for the machine from the PCB you are about to run. Now here is the tricky part. At this point, you have implement a a number of system calls that write values into a user space buffer. For example, SYS_ioctl takes a kernel pointer to user space memory so it can fill in the memory with a struct JOStermios structure. This pointer has to be relative to the base set for the process. That is, from the perspective of the kernel, the user address is an offset from the base for that process. In every system call that writes user space memory, you need to make sure you are including the base in the kernel address. Even trickier, the initial stack pointer value (used by MoveArgsToStack() and InitCRuntime() as described below) is a user space address. That is, unlike a kernel address, it is relative to zero. When we set it to MemorySize-12, we had set the base to zero which says to use all of memory. However, if we are going to change the process to run in less memory, then the User_Limit value says what the largest user space address is. Thus you need to change the initial stack pointer to be the limit for the process - 12. Lastly, you need to modify the calls to MoveArgsToStack() and InitCRuntime() (again in InitUserProcess()) so that the third argument is set to what ever the value is that you have set for User_Base for the process. This way, the arguments will be set in the user partition you decide to allocate. Test it out by first setting the base to zero and the limit to MemorySize. Next, try setting base to 1024 and limit to MemorySize-2048. It should work exactly the same in each case. Step 6 -- Time to attack execve. Look at exec.c in the test_execs directory. As you see, you can use exec without fork or wait. We'll implement it using exec as the test program. The first thing we have to do is split InitUserProcess() into two parts. The first part will allocate a new PCB and initialize things like its limit/base fields and its registers. Next, it calls perform_execve(PCB, fn, argv), where PCB is the new PCB, fn is a string with the name of your initial executable, and argv is the argv of this initial PCB. PerformExecve returns an integer -- zero on success, and an errno if an error occurred. PerformExecve() loads the user program, sets the stack and the registers, and then returns zero. If there were any errors (e.g. the program didn't load), it returns the proper errno. When PerformExecve() returns to InitUserProcess(), it either exits because there was an error, or it puts the new PCB onto the ready queue and calls kt_exit(). Try this out and run one of your old programs as the initial program (e.g. argtest, cat, or hw). Step 7 -- Now that you have the initial process using PerformExec(), it is time to work on the execve system call. When you get the execve system call, fork off a thread to service it in exception.c. Check the man page for the execve() system call. It takes three arguments but for this lab, the third argument (envp) will always be NULL. The first thing that this thread must do is recognize/check the arguments. This is tricky. First, see if you can print out the file name (first argument). That's easy. Now, see if you can print out all of the argv's. Step 8 -- Now, you have to copy the file name and the argv strings into KOS's memory. Why? Because you are going to be loading a program over the top of the user's memory, and you don't want to lose the file name and the argv strings. Do this copying (yes, you'll have to call malloc) and test. Step 9 -- Now you're ready to call PerformExecve() on your filename and argv pointer from the kthread you fork off in exception.c for SYS_execve. There is one particular trickiness that you must observe, however, and that has to do with your NextPC value. When you begin executing a new process, the simulator expects the PC value to be set to 0 and the NextPC value to be set to 4. If you implemented SysCallReturn() as I recommended in KOS Lab 1, then when you reset the registers, you will need to be careful *not* to set registers[NextPCReg] to 4. Why? SysCallReturn() loads registers[PCReg] from registers[NextPGReg]. If you set registers[NextPCReg] = 4 and then call SysCallReturn() it will set registers[PCReg] = 4 and then run_user_code() with the PC = 4. But to *start* a program with execve() the initial PC value needs to be zero -- not 4. You can fix this in the following way. Change SysCallReturn() to set registers[NextPCReg] = registers[PCReg] + 4 after you set registers[PCReg] = registers[NextPCReg]. Then, berfore you call PerformExecve() make sure registers[NextPCReg] == 0, and it should work. If you don't handle this it will look like it is sort of working and it will work for some programs but then crash in weird ways for other so don't skip ahead until you understand this issue and have implemented it correctly. The other issue is that you need to reset the stack pointer to point to the last 12 bytes of the memory space. You do this by setting the registers[StackReg] = User_Limit - 12 for the User_Limit value of the process. Note, also, that you will need to call MoveArgsToStack() and InitCRuntime() for the process after you call load_user_program() so that you initialize the arguments in the new program's memory space. Recall that these function read the stack pointer in the registers argument so you will have needed to reset the stack pointer before making these calls. All of this means that you need to take a look at your implementation of PerformExecve() and make sure that the register values are set correctly. The stack pointer needs to be set before any call to MoveArgsToStack() or InitCRuntime(), the registers need to be cleared, the PCReg and NextPCReg need to be set appropriately. If PerformExecve() returns with an error, call SysCallReturn() with that error. Be sure you restore the original set of registers that were in the PCB when SYS_execve was called in the error case since you need to go back to the program that called execve() and report the error. Otherwise, everything worked, and you can call SysCallreturn() with zero (granted, you aren't really returning from a system call, but this will get the PCB onto the ready queue). In either case you need to call free() on anything that you malloc'd in step 8. Test this out on the exec program. When you're done, execve should be working! Step 10 -- Now is a good time to put process id's into your PCB. Diverging from Linux, we're going to make process id's ints which makes the implementation a little easier. We are going to try and reuse process ids as much as possible. Write a piece of code called "int get_new_pid()". It returns an unused process id. How does this work? I have a rb-tree of process id's that are in use. get_new_pid() starts at 1 (there is no valid process id zero) and increments curpid, and checks the tree to see if curpid is there. If so, it increments curpid and tries again until it gets a pid that's not there. Then it puts curpid into the tree and returns it. While you're at it, write destroy_pid(int pid), which removes the given pid from the tree. Test this out (figure out how, or be arrogant and don't test it). Your code should always assign the lowest available pid to a process. Note that you can implement this with a doubly-linked list as well but you will need to scan the list each time you assign a pid to determine what the lowest number is that is available. You can also use the rb tree and scan for the lowest unusued pid. It is your choice. Step 11 -- Now, initialize the pid of your first process in InitUserProcess(), and implement the getpid() system call. It should be obvious that this will be a function that will return the current PCB's pid through a call to SysCallReturn() i.e. it is separate from the function you designed in step 10 and it is forked off as a kthread in exception.c. Step 12 -- Now, implement fork(). This is a tough one to do incrementally, but you should. You need to first write primitives to split up memory into 8. This means that you can have eight processes running at any one time. Now, when you first start processing a fork call, check to see if you have room in memory, and if not, return EAGAIN. If it's ok, allocate a new PCB and initialize its fields -- limit and base will be to a new part of memory. The registers should be copied from the calling process's PCB. It should get a new pid. Its memory should be a copy of the calling process's memory. That is, you need to copy the memory from the partition from the calling process to a new partition that the new process will use. Now, call SysCallReturn(newPCB, 0), and ignore the calling process. If this works, when you test it on the fork program you should get the following output (the pids might be different -- the pid of my first program is 1): mypid = 2. fork returned 0 Go back and change the SysCallReturn() call to SysCallReturn(origPCB, newPCB->pid). Now, the new process will be created, but lost. The output will be: mypid = 1. fork returned 2 For a simple fork program, see test_execs/fork.c. Step 13 -- Now, before the SysCallReturn() call, call kt_fork(FinishFork, newPCB). And have FinishFork simply call SysCallReturn(PCB, 0). Note that FinishFork() takes newPCB as an argument. That is, it uses the new PCB (the child) and returns 0 as if the child had made a system call of its own. When the scheduler gets called, there will be two processes on readyq. If everything works as in my code, you'll get the following output to the fork program: mypid = 1. fork returned 2 m In other words, the parent process returned and printed out its string. Then, just before it exited, the child process started printing out its string. But then the parent process exits, and SYSHalt() is called, shutting down the system. However, your fork() call works! Step 14 -- Time to fix exit(). Instead of having it call SYSHalt(), you should have it kill the process: release the memory that it was consuming so that other processes may use it. Save the exit value in the PCB. However, don't deallocate the PCB yet, and don't free up the pid yet. When you're done, simply call kt_exit() so that the scheduler will take over. This creates a kind of zombie process (since the pid is not destroyed, and the PCB still exists), but since we haven't implemented wait(), the zombie will never go away. Test this on the fork program in test_execs. It should now run to completion and hang when it's done: mypid = 1. fork returned 2 mypid = 2. fork returned 0 If you get the same statement twice mypid = 1. fork returned 2 mypid = 1. fork returned 2 Then you may not have your read and write functions set up to deal with the fact that you can have multiple processes in memory at the same time now. Fix this. Step 15 -- Implement the getdtablesize() system call (using kt_fork() in exception.c) so that it returns 64. Step 16 -- Implement the close() system call so that it returns -EBADF whenever it is called. We're going to ignore the close() system call until the next lab, but we have to deal with the calls that ksh will make. Step 17 -- Implement wait() so that it calls SYSHalt(). Now, run ksh.cookbook-step17 (a hacked version of ksh for this step), and execute programs so that wait() is never called (i.e. do: "argtest x y z &", then "hw &", then "exec &"). Don't use any programs that read from stdin. These should work. That's pretty cool. Make sure you do this at least 8 times from the console so that you test reusing parts of main_memory. Now, try something where wait is called, like "hw" without the ampersand. It should halt. See if you can fork off the cpu program 8 times quickly (you may need to employ the help of your mouse for this) and get an error that you have no more processes to fork. Actually, I couldn't do this because we haven't dealt with the timer yet, and once the cpu program gets control of the cpu, it doesn't give it up. Drag. Step 18 -- Now, we're going to implement wait(). This is slightly tricky. Read the man page for wait() before you begin. First, we need to implement getppid(). So, we need to have a parent field in our PCB. Note that this should point to the parent's PCB, and not contain the parent's PCB (you will need to modify fork to make this work for all processes except the first process). I have a sentinel PCB, to which I give the pid of 0. It never runs, but I treat it like the Init process. I make this the parent of the first process. For our first process, we'll have its parent be this Init process. Get all of this working and get getppid() working. Don't worry about having a parent process die -- since we're not deleting PCB's upon exiting, everything will work. Try the getppid program from ksh.cookbook-step17 ("getppid &"). Its output should be something like: 1 My pid is 3. My parents pid is 2 3 My pid is 3. My parents pid is 2. Fork returned 4 2 My pid is 4. My parents pid is 3. Fork returned 0 3 My pid is 4. My parents pid is 3. Fork returned 0 Run it a few times and make sure your pids are right. Note that at that last line, the parent has already called exit. Everything is still ok because you haven't deallocated the PCB yet. Step 19 -- Add a semaphore called waiter_sem and a dllist called waiters to each process's PCB. Initialize the semaphore to zero (do this everywhere a pcb is created). When a process exits, it should call V() on its parent's waiter_sem, and put its PCB onto the parents waiters list. Also, add a field to the PCB that saves the exit code passed to the exit() system call as an argument. Step 20 -- Now, when a process calls wait(), it should call P() on its waiter_sem semaphore. When this unblocks, there is a child that is done. It should take the child off that dllist, free up its pid and PCB, and use the child's PCB (before freeing of course) to fill in the return values to the wait call (see man page for wait()). Test this -- you should now be able to run test_execs/ksh and have it wait for them. I.e.: ksh> hw Hello world the write statement just returned 12 ksh> hw Hello world the write statement just returned 12 ksh> argtest a b c d argc is -->5<-- argv is -->104388<-- envp is -->0<-- argv[0] is (104420) -->argtest<-- argv[1] is (104418) -->a<-- argv[2] is (104416) -->b<-- argv[3] is (104414) -->c<-- argv[4] is (104412) -->d<-- ksh> hw & ksh>Hello world the write statement just returned 12 hw Hello world the write statement just returned 12 ksh> Step 21 -- Note that this deals just fine with zombie processes. However, orphans are a problem. To deal with this, you should add a dllist or rb-tree to your pcb struct called "children". This holds all non-zombie children of the process in question. It should be keyed on the child's pid, and have the child's pcb as a value. Write the code that inserts the child into this list when fork is called, and that removes the child from the list when the child calls exit. Test it. Also, in this step, make sure you handle the case where a process calls wait() but it has no children (either alive or zombie). Again, see the man page for wait() to figure out how to handle this case. Test it. Step 22 -- Ok, now when a process dies, it needs to make all of its children switch parentage to the Init process. Do that for all the non-zombie children (these are the one's in the children list -- the zombies are in the waiters dllist) -- take them off the children list, switch their parent pointers, and put them into the children list for the Init process. Test this. One way to test this is to run the getppid program. The child process should become a zombie. Make sure it is inherited by Init: 1 My pid is 3. My parents pid is 2 3 My pid is 3. My parents pid is 2. Fork returned 4 2 My pid is 4. My parents pid is 3. Fork returned 0 ksh:3 My pid is 4. My parents pid is 0. Fork returned 0 Note the ksh prompt came back when the parent exited. Step 23 -- When a child of Init becomes a zombie, it never gets cleaned up. The easiest way is to have the child check the parent when it exits. If the parent is Init, it frees itself. Test it -- first exit ksh. Init should clean that up (you may have to resort to printf statements to test this). Next, fire it up again and call getppid. Make sure that Init is cleaning up the orphans. Notice that, technically, your OS never exits once you implement this step since you have created an Init process and it never dies. That is, your OS will call noop() because its sees that Init exists but has nothing to do. To fix this, make sure that the first process you run is added to the children list for Init and that any orphans get added to the children list when they are orphaned. Then, in the scheduler, if the children list for Init is empty, there are no more processes in your system and you can halt. Think through this step carefully. If your first process (the one created in KOS()) is a child of Init, and any time an orphan is created, it becomes a child of init, and you clean up zombies when their parent exits, then when there is at least one process that is "alive" there will be at least one process on the children's list for Init. As a result, when the scheduler runs and the children's list of Init is empty, no more processes are alive and the OS can shut down. Step 24 -- There's one final case missing. And that is when a process exits, but it has zombies on its waiters list. These need to be cleaned up. There are two ways to do this that are relatively easy. The first is to realize that when a process dies, and it has not called wait, and it has dead children (zombies) on the waiters list the exit code for these zombie children will never be returned to a user process and Init doesn't care about it. As a result, when a process dies, it can simple deallocate the PCBs on its waiters list. The other, slightly more complicated way to do this is to move all of the PCBs from the processes waiters list to the waiters list of Init. Then, in the scheduler, check the waiters list of Init each time it is called and deallocate any PCBs that are there. Whatever you do, test it -- call "hw &" from ksh, wait for it to finish, and then exit ksh. The zombie process should get cleaned up. Step 25 -- Finally, we're missing one thing -- the timer. This is trivial. Call start_timer() at the end of InitUserProcess() with an arbitrary initial value (your TA's chose 10). If you are handling interrupts properly, when you field any interrupt (including a time interrupt) you will put Current_PCB back on the end of readyq and then call kt_joinall() followed by the scheduler. If there are multiple processes on readyq, they will each get a turn to run after an interrupt. Test this by calling the cpu program in the background a bunch -- see how its behavior differs from when the timer is not implemented. Is it working? Make sure that your code works in the face of errors. Test everything that you can think of. Run the code in the executables directory and make sure that you understand what is going on in KOS step by step. Step 26 -- Give yourself a well deserved pat on the back. You have processes working!