Advice from Your Aunt Heloise for the Pipe Lab

Okay, there are essentially four things to do in this lab:

Implement file descriptors
Implement a pipe data structure (including synchronization)
Access the pipe data structure via your existing read/write calls
Implement the dup, dup2, and pipe system calls

Do not think about doing these things in this order. It is just a way to break the problem down into subproblems that you can attack. Moreover you may implement part of one and then go back and finish its implementation when you are doing part of another. That is -- this isn't a cook book. Rather, it is a set of notes that discuss the different aspects of what you'll need to get working to complete the lab.

File Descriptors

The first thing to understand is that a file descriptor is an integer that refers to a structure in your PCB (see the System Call Lecture). Technically you should create such a record for file descriptors 0, 1, and 2 which are stdin, stdout, and stderr respectively. For the lab, however, it is okay if you treat 0, 1, and 2 as being "special" and always have them refer to the keyboard and console.

The way to indicate this specialness is up to you, but the easiest thing is to put a flag in the file descriptor data structure that indicates whether the descriptor refers to the console or not. Initially, the flag is set (indicating the console) for file descriptors 0, 1, and 2. If the code closes a file descriptor that refers to the console, then this flag should be cleared. If the file descriptor is "duped" then the flag must be duped as well. Don't worry about getting open() to work for the console. If all file descriptors for the console are closed, then it is okay if there is no way to reopen it.

Further, in a regular file system, a file descriptor points to an open file table entry. Since we won't be implementing files in this class, it is okay to have your file descriptors point to your pipe data structure. That is, there will be two kinds of file descriptors: 0, 1, and 2 which work the same way as they do in the previous lab, and file descriptors that are created by the pipe() system call that point to a pipe data structure you design.

Read the man page on pipe() carefully. In particular, note that it returns two separate file descriptors -- one for reading and another for writing -- that can be each closed separately. However both file descriptors refer to the same pipe. Thus your file descriptor should contain a way to distinguish a read file descriptor from a write file descriptor since closing a reader will mean something different than closing a writer.

Pipe Data Structure

The data structure for a pipe itself should look, conceptually, similar to the bounded buffer examples we discussed in lecture. In particular, each pipe has a set of writers (these will be processes in this lab) and a set of readers (also processes in this lab). Writers can only write if there are spaces to write in and readers can only read when there is valid data in the spaces. For pipes, the spaces are bytes.

Think about how the bounded buffer problem works (recall that we studied the bounded buffer problem in the Client/Trader Lectures). You'll need to be able to determine when there is data in the pipe (because it has been written), where that data is, and when it has been delivered to a reader. Each byte goes into the pipe when it is written, and comes out of the pipe when it is read. Further, the data must be read out of the pipe in the order that it is written (i.e. in FIFO order).

Pipes add an additional wrinkle that has to do with what happens when there are multiple writers and multiple readers. Let's imagine that there are two processes writing a pipe and two processes reading it (don't worry yet how this situation came to be -- I'll discuss it below). Technically, if the slots are bytes, you are entitled to schedule the writers round-robin so that the bytes go into the pipe in an interleaved way. Similarly, you are entitled to interleave the readers on the read side.

Pipes are different from the bounded buffer example, however, because they try and preserve buffer boundaries. For example, imagine that

writer 1 executes write(pd[1],buf,10)
writer 2 executes write[pd[1],buf1,10)
reader 1 executes read(pd[0],rbuf,10)
reader 2 executes read(pd[0],rbuf1,10)

The funny thing about pipes is that regardless of the order in which these reads and writes happen, the pipe will attempt to ensure that one of the two following conditions are true after all four system calls have completed (regardless of their execution order). Either

rbuf contains the contents of buf and rbuf1 contains the contents of buf1
rbuf1 contains the contents of buf and rbuf contains the contents of buf1

That is, the pipe will not distribute the contents of different write calls among possible different read calls. Put another way, the pipe tries to make read and write calls atomic.

Notice that this can get tricky when the sizes do not match. For example, what happens if the writers in this example each write three bytes but the readers try and read 10? That's tricky because if the two writers run before any reader, the first reader has enough space in her read buffer for both writes. You get some leeway in this case but in my view, the correct thing to do would be to return 6 bytes (3 bytes from the first write and 3 bytes from the second) to the first reader and have the second reader block.

When in doubt, ask Linux. That is, write a test program that tests these scenarios on Linux and try (as best you can) to emulate the Linux functionality. However don't go overboard. We won't be trying to trip you up by testing wicked corner cases so you should spend a ton of time trying to make sure you exactly match Linux.

However one thing you should not do is to make it so that a reader or a writer blocks indefinitely if it is possible to make progress. For example, imagine that

writer 1 executes write(pd[1],buf,5)
reader 1 executes read(pd[0],buf,10)

and there are no other writers and readers. The reader should not block waiting for the writer to send another 5 bytes. Instead, your pipe implementation should notice that the reader has some data ready for it when the write completes. It should fill the reader's buffer in with 5 bytes and return to the reader with a return value of 5 rather than to continue to block the reader waiting for 10 bytes to be available.

Read and Write System Calls

Read and write should work more or less the way they do in your previous labs with the exception that you won't be delivering data from/to the console, but rather from your pipe when the file descriptor is greater than 2. You will need to make sure that you block and unblock the processes correctly. In the previous lab you did this with semaphores. You will want to do the same but to make the blocking and unblocking happen as a result of the state of the pipe and not the console interrupt firing.

One additional wrinkle concerns what happens when a reader or a writer calls close() on an end of a pipe. You will need to implement close() for file descriptors. Additionally, your implementation will need to recognize when the last writer has closed and deliver an EOF to any readers after the last byte from the pipe has been read. For example, imagine

writer executes write(pd[1],buf,10) followed by close(pd[1])
reader executes read(pd[0],buf,10) followed by read(pd[0],buf,10)

and that the writer runs both of its commands before the reader runs any command. After the writer calls close(), your implementation should note that the pipe has no writers left, but that there is a reader and there is undelivered data. The first read should return the 10 bytes in the order that the writer wrote them. The second read, however, should return immediately with an EOF since there is no data in the pipe and no writers left to write any data. If the writer had not called close() then the second read should block until either more data is written to the pipe or the writer calls close().

In the reverse case, if there are no readers for a pipe, a writer attempting to write the pipe should get an EBADF error (or some other error, but EBADF is a good choice) indicating that file descriptor is invalid. You need to be careful with test codes here.

Dup, Dup2, and Pipe Systems Calls

Read the man pages carefully for these calls. The tricky part here is that you are going to need to keep track of how many file descriptors point to the writer side and how many point to the reader side. The easiest way to do this is with reference counters for readers and writers. For example, when a process calls pipe(pd) you'll create a pipe data structure and point two different file descriptors to it (which you return as the entries of pd -- see the man page). You will also want to set a reader reference count to 1 and a write reference count to 1 to indicate that there are open read and write file descriptors.

You will also need to allocate two free file descriptors from the PCB whenever a pipe system call is executed by a user program. It is okay for the file descriptor table to be of fixed size (make it large enough to handle several pipes simultaneously along with stdin, stdout, and stderr). If you are out of file descriptors, then the pipe system call should return an error.

Similarly, you'll need to free file descriptors when a file descriptor is closed.

Now let's imagine that the process forks. You will need to make copies of the file descriptors in the child process so that the parent and child file descriptor table look the same. Notice also, though, that you should bump the reference counts both to 2 since there are now two open writer file descriptors (one in the parent and one in the child) and also two open read descriptors.

Now let's say that the parent closed pd[0] (the read side of the pipe) and the child closes pd[1] (the write side of the pipe). That is, the parent intends to write into the pipe but never to read and, similarly the child intends to read from the pipe but never to write. What should the reference counts be after the two close operations? The writer count should be 1, and the reader count should be 1.

Notice that you'll need to make sure the pipe reference counts are correct in your process exit processing as well. For example, imagine that the writer in this case exits. The call to exit() in your OS should decrement the writer reference count and treat the reader as if the writer has closed the pipe. That is, the file descriptor gets closed either when the process calls close() on it, or when it dies.

Final Hints

Since there is no cook book for this lab, the best thing to do is to understand as thoroughly as possible the pipe(), dup() and dup2() system calls and to write a bunch of your own test codes that you both compile for Linux and cross compile for your KOS. Using ksh to launch programs with pipes is an excellent way to drive your system but as your last lab may have shown, there is a lot happening with ksh that you may not understand. Writing a simple test code that exercises the exact features you are implementing at any one time can save you time and frustration.