CS170 Lecture notes -- Dining Philosophers

Rich Wolski and James Plank
CS170: Operating Systems
Directory: /cs/faculty/rich/public_html/cs170/notes/DiningPhil
Lecture notes: http://www.cs.ucsb.edu/~rich/class/cs170/notes/DiningPhil/index.html
Examples: http://www.cs.ucsb.edu/~rich/class/cs170/notes/DiningPhil/example or on Github in https://github.com/richwolski/cs170-lecture-examples.git in the DiningPhil subdirectory

Dealing with Deadlock -- The Life of a Professional Academic

The purpose of this lecture, in addition to covering another of the classical synchronization problems, is to understand the concept of deadlock and the tradeoffs associated with avoiding it.

Deadlock occurs in a concurrent program when multiple threads of execution are blocked and cannot make progress because each is waiting for a condition in order to continue that only the others can make true.

There are four "requirements" for deadlock:

mutual exclusion: each thread of control must gain mutually exclusive access to a subset of the set of state variables that define the progress condition,
hold-and-wait: each thread of control must remain inside its mutual exclusion region while it waits for the condition to become true,
circular wait: some thread of control waits on a set of threads and conditions that ultimately depend on it to make progress,
no preemption: each thread of control will wait indefinitely until the program logic defines the progress condition to be true.

Removing any of these properties from an algorithm or program that can deadlock removes the possibility that deadlock will occur. As we will see, some deadlock avoidance solutions are more efficient (in terms of, say, fairness) than others.

Dining Philosophers

The dining philosophers problem is a ``classical'' synchronization problem. Taken at face value, it is a pretty meaningless problem, but it is typical of many synchronization problems that you will see when allocating resources in operating systems.

The book (chapter 5) has a description of dining philosophers. I'll be a little more sketchy.

The problem is defined as follows: There are 5 philosophers sitting at a round table. Between each adjacent pair of philosophers is a chopstick. In other words, there are five chopsticks. Each philosopher does two things: think and eat. The philosopher thinks for a while, and then stops thinking and becomes hungry. When the philosopher becomes hungry, he/she cannot eat until he/she owns the chopsticks to his/her left and right. When the philosopher is done eating he/she puts down the chopsticks and begins thinking again.

Of course, the definition of this problem always leads me to ask a few questions:

If these philosophers are so smart, shouldn't they be worried about communicable diseases?
Why chopsticks? For some reason I envision philosophers liking soup or peanuts.
Evidently conversing isn't in here -- why do they need to be at the same table? I'm not sure if I'd enjoy having a philosopher philosophize while I'm eating. But then again, I'm not a philosopher.
Shouldn't bathing be in the equation somewhere?

Since these are either unwashed, stubborn, and deeply committed philosophers or unwashed, clueless, and basically helpless philosophers, there is a possibility for deadlock. In particular, if all philosophers simultaneously grab the chopstick on their left and then reach for the chopstick on their right (waiting until one is available) before eating, they will all starve. The challenge in the dining philosophers problem is to design a protocol so that the philosophers do not deadlock (i.e. the entire set of philosophers does not stop and wait indefinitely), and so that no philosopher starves (i.e. every philosopher eventually gets his/her hands on a pair of chopsticks).

Dining Philosophers Testbed with pthreads

What we've done is hack up a general driver for the dining philosophers problem using pthreads, and then implemented several "solutions".

The driver is in dphil_skeleton.c. The header file is dphil.h. It works as follows. First it calls initialize_state(), which is a procedure that is undefined. It expects a (void *) in return. This pointer will be passed to all procedures as part of the Phil_struct struct, and the user should initialize it however he/she likes.

Next, the driver does a few other things, and finally forks off five philosopher threads. After doing so, the main thread sleeps for ten seconds, and prints out information about how long each philosopher has been blocked, waiting to eat. This is so that you can make some assessment of how good the protocols are at letting philosophers eat.

Now, the philosophers basically go through the following steps.

while(1) {
  think for a random number of seconds
  pickup(p);
  eat for a random number of seconds
  putdown(p);
}

p is the philosopher's Phil_struct. The pickup() call is timed and this time is added to the total blocked time for each thread.

Each solution to this problem must implement initialize_v(), pickup() and putdown() to manage the chopsticks. Pickup() and putdown() should be written so that no philosopher starves (i.e. wants to eat, but never gets a chopstick), and so that deadlock doesn't occur (a subset of the above, because it would mean that all philosophers starve...) It should also attempt to try to minimize the amount of time that the philosopher's spend waiting for chopsticks.

Solutions to the dining philosophers

Each program will expect two arguments:

the number of philosophers you wish to simulate
the maximum number of seconds that a philosopher can eat or think at a time.

Here are descriptions of the solution programs:

A mutex for each chopstick

dphil_2.c implements the procedures by having each chopstick be represented by a mutex lock, and each philosopher will attempt to pick up the chopstick on his left first (by locking it), then right, then eat, then put down the right one, and then put down the left one.

typedef struct sticks {
  pthread_mutex_t *lock[MAXTHREADS];
  int phil_count;
} Sticks;

     
void pickup(Phil_struct *ps)
{    
  Sticks *pp;
  int i;
  int phil_count;

  pp = (Sticks *) ps->v;
  phil_count = pp->phil_count;

  pthread_mutex_lock(pp->lock[ps->id]);       /* lock up left stick */
  pthread_mutex_lock(pp->lock[(ps->id+1)%phil_count]); /* lock up right stick
*/
}

void putdown(Phil_struct *ps)
{ 
  Sticks *pp;
  int i;
  int phil_count;
  
  pp = (Sticks *) ps->v;
  phil_count = pp->phil_count;
  
  pthread_mutex_unlock(pp->lock[(ps->id+1)%phil_count]); /* unlock right stick
*/
  pthread_mutex_unlock(pp->lock[ps->id]);  /* unlock left stick */
}

This is prone to deadlock, although on this system you really won't ever see it because of the granularity of timeslicing between threads. The only time that this solution is a problem is if a philosopher's thread gets preempted between picking up the first and the second mutex. That doesn't really ever happen here, so it looks like it works just fine.

In dp_2_out.txt, you'll see the output of running dphil_2 5 5 for 300 seconds. There's no deadlock, but as you'll see later, the threads spend more time blocked than they should. I'll let you think about why.

Showing how you get deadlock with the mutex solution

dphil_3.c is the same as dphil_2.c, but it puts a 3-second delay between picking up chopstick 1 and chopstick 2. You get deadlock instantly if all the philosophers try to pick up their chopsticks at once:

UNIX> dphil_3 5 3
  0 Total blocktime:     0 :     0     0     0     0     0
  0 Philosopher 0 thinking for 2 seconds
  0 Philosopher 1 thinking for 3 seconds
  0 Philosopher 2 thinking for 1 seconds
  0 Philosopher 3 thinking for 2 seconds
  0 Philosopher 4 thinking for 3 seconds
  1 Philosopher 2 no longer thinking -- calling pickup()
  2 Philosopher 0 no longer thinking -- calling pickup()
  2 Philosopher 3 no longer thinking -- calling pickup()
  3 Philosopher 4 no longer thinking -- calling pickup()
  3 Philosopher 1 no longer thinking -- calling pickup()
 10 Total blocktime:     0 :     0     0     0     0     0
 20 Total blocktime:     0 :     0     0     0     0     0
  ...

The delay inserted in the pickup routine:

void pickup(Phil_struct *ps)     
{
  Sticks *pp;
  int phil_count;

  pp = (Sticks *) ps->v;
  phil_count = pp->phil_count;

  pthread_mutex_lock(pp->lock[ps->id]);       /* lock up left stick */
  
  sleep(3);
  
  pthread_mutex_lock(pp->lock[(ps->id+1)%phil_count]); /* lock up right stick */
}

An asymmetrical solution deadlock-free solution

dphil_4.c is the same as dphil_2.c, only odd philosophers start left-hand first, and even philosophers start right-hand first. This does not deadlock, even if you put a delay in between pickup up chopsticks 1 and 2.

There are two problems with this solution. The first is minor. This solution can exhibit starvation depending on how the thread system is implemented. For example, suppose philosopher A is waiting for a chopstick. Eventually, the owner of the chopstick (philosopher B) will eat and put the chopstick down, but there's no guarantee that philosopher A will get it if philosopher B wants to eat again before philosopher A's thread is rescheduled. Given our thread system and the randomness in the sleep() calls, that does not appear to be a problem, but it could well be on a different system with different parameters.

The more major problem is that the philosophers are not equally weighted here. If you look at dp_4_out.txt, you'll see the output of running dphil_4 5 5 for 300 seconds. The interesting thing here is the block-times. You'll note that philosopher #4 blocks for much less time than the rest. Why? The reason is kind of subtle. Suppose all the philosophers want to eat at the same time. Philosophers 0 and 1 will have to fight for their first chopstick, as will philosophers 2 and 3. However, philosopher 4 will always get his first chopstick. This phenomenon (which is really more complex than that, but that's the basis of it) gives philosopher 4 an advantage over the others, meaning he eats more. Thus, if you are looking to give all the philosophers equal weight, you can't use this solution.

The book's solution

In section 5, the book gives, as a programming project, the dining philosopher's problem and suggests the use of condition variables to implement a monitor.

One way to do this relies on the use of "state" variables. When a philosopher wants to eat, he/she checks both chopsticks. If they are free, then he eats. Otherwise, he waits on a condition variable. Whenever a philosopher finishes eating, he checks to see if his neighbors want to eat and are waiting. If so, then he calls signal on their condition variable so that they can recheck the chopsticks and eat if possible.

This is coded up in dphil_5.c. You'll note that we don't keep track of the chopsticks explicitly. Instead, we keep track of the philosophers' states.

A problem with this solution is starvation. For example, trace through dp_5_starve.txt. As you see, after a few seconds, philosophers 0 and 2 get to eat, then 1 and 3, and then 0 and 2 again and so on. Philosopher 4 never gets to eat, because there is never a time when 0 and 3 are both not eating.

In an example with a higher sleep time (dp_5_out.txt), starvation is not a problem, and you'll see that all the threads block for roughly the same amount. Moreover, the total blocking time is similar to dphil_4. Thus, this is a decent solution. It's only problem is that you can get starvation in certain pathological cases.

Preventing starvation

So, in order to prevent starvation you either need:

A guarantee from the thread system that threads will be unblocked from monitors and condition variables in the same order that they are blocked. In such a case, dphil_4.c will work fine, since a blocked philosopher will then be guaranteed to get a chopstick once it is released. Of course, this doesn't solve the problem that philosopher 4 gets to eat more than the rest, but it does prevent starvation.
To do it yourself. In other words, you must guarantee that no philosopher may starve. For example, suppose you maintain a queue of philosophers. When a philosopher is hungry, he/she gets put onto the tail of the queue. A philosopher may eat only if he/she is at the head of the queue, and if the chopsticks are free.

Using a Queue to Prevent Starvation

dphil_6.c implements this approach. When a philosopher calls pickup(), if the queue is empty, the chopsticks are checked, and if they are in use, the philosopher is put on the queue. If they are not in use, the philosopher is allowed to eat, and pickup() returns.

The heart of the routine is a function that tests the head of the queue to determine if it can run (based on its neighbors' status) and signals the head if it can.

test_queue(void *v)
{
  int id;
  int phil_count;
  Phil *pp = (Phil *)v;
 
  phil_count = pp->phil_count;

  if (!dll_empty(pp->q)) {
    id = (int) jval_i(dll_val(dll_first(pp->q)));
    if (pp->state[(id+1)%phil_count] != EATING &&
                    pp->state[(id+(phil_count-1))%phil_count] != EATING) {
      pthread_cond_signal(pp->cv[id]);
    }
  }
}

Notice that this function must be called inside a critical section protected by mutex locks. The head of the queue, when it is awake, deletes itself from the queue, sets its state to EATING, and tests the new head of the queue.

void pickup(Phil_struct *ps)
{
  Phil *pp;
  int phil_count;

  pp = (Phil *) ps->v;
  phil_count = pp->phil_count;

  pthread_mutex_lock(pp->mon);
  if (dll_empty(pp->q) &&
                  pp->state[((ps->id+1)+(phil_count-1))%phil_count] != EATING
&&
                     pp->state[(ps->id+(phil_count-1))%phil_count] != EATING)
{
    pp->state[ps->id] = EATING;
  } else {
    /*
     * technically, we don't need to retest because each Phil has its own
     * cond variable and signal must wake up at least 1.  A spurious wake up,
     * though, would cause a problem
     */
    dll_append(pp->q, new_jval_i(ps->id)); /* put me on the queue */
    node = dll_last(pp->q);
    while((node != dll_first(pp->q)) || (pp->state[((ps->id+1))%phil_count] == EATING) ||
          (pp->state[(ps->id+(phil_count-1))%phil_count] == EATING)) {
        pthread_cond_wait(pp->cv[ps->id], pp->mon);
    }
    dll_delete_node(pp->q->flink); /* I must be at the head of the queue */
    pp->state[ps->id] = EATING;
    test_queue(ps->v); /* signal head of the queue if its neighbors aren't eating */
  }
  pthread_mutex_unlock(pp->mon);
}

Note how this checking must be performed in a critical section. When putdown() is called, the chopsticks are released, and then test_queue() is called, which checks the head of the queue to see if the philosopher there can eat. If so, that philosopher is unblocked, and then he/she can eat.

Try the program out to see that it works. Moreover, note that there are times when a philosopher can call pickup() and the sticks can be available, but the philosopher blocks. This is because the queue isn't empty. Thus, the solution may not allow philosophers to eat as much as they would like, but it does prevent starvation. Think about ways that you could prevent starvation, but also allow less blocking time for philosophers.

dp_6_out.txt shows the output of dphil6 5. Note how the total block time here is much higher than dphil_4 and dphil_5. This is because a philosopher might block even though the chopsticks are free, because another philosopher is hungry and on the queue.

So, what are the lessons?

First, when multiple threads or processes access multiple resources exclusively, you must worry about deadlock.
Second, you must worry about starvation, and the only way to prevent starvation is to enforce that all threads/processes get unblocked every now and then. This can be using a global queue, or some ordering scheme but the efficiency of the solution (in terms of delaying philosophers that might otherwise eat) is a concern.
Third, often you have to worry about treating all threads equally, so that no one thread gets more resources than the others due to your synchronization protocol.