CS170 Lecture notes -- Semaphores -- Avoiding a Train Wreck


In this lecture, we cover the semaphore abstraction, and various ways to use semaphores to synchronize activities in a computer. Most of this material is drawn from an unpublished book by Allen B. Downey. Thanks are due to Prof Downey for allowing us to use a pre-release copy of the book.

Synchronization

So far we have discussed mutexes and condition variables as the tools of synchronization and of managing critical sections of code. These are not the only tools that can be used for the job, and you are going to find yourselves very soon doing a lab where mutexes and condition variables are not available to you, but semaphores are. So we need to consider semaphores.

The concept of semaphores as used in computer synchronization is due to the Dutch computer scientist Edsgar Dijkstra. They have the advantages of being very simple, but sufficient to construct just about any other synchronization function you would care to have; we will cover a few of them here. There are several versions of the semaphore idea in common use, and you may run into variants from time to time. The end of these notes briefly describe two of the most common, binary semaphores and the SYSV IPC semaphores.

A semaphore is an integer with a difference. Well, actually a few differences.

There are various ways that these operations are named and described, more or less interchangeably. This can be confusing, but such things happen in computer science when we try to use metaphors, especially multiple metaphors, to describe what a program is doing. Here are some:

Increment
Dijkstra called this function V(); it is also called signal, unlock, leave or release.
Decrement
Dijkstra called this function P(); it is also called wait, lock, enter, or get.

Implementation

The easiest way for me to think of semaphores is, of course, with code. Here is a little pseudo-code that may help.
typedef struct sem {
  int value;
  other_stuff
} *Sem;
There are two actions defined on semaphores: P(Sem s) and V(Sem s). (The book calls them P() and V()). P and V are the first letters of two Dutch words proberen (to test) and verhogen (to increment) which, on balance, makes about as much (or as little) sense as any other set of monikers. The inventor of semaphores was Edsger Dijkstra who was very Dutch.
initialize(i)
{
    s->value = i
    return
}

P(Sem s)
{
    s->value--;
    if(s->value < 0)
	block on semaphore
    return
}

V(s)
{
    s->value++;
    if(s->value <= 0)
	unblock one process or thread that is blocked on semaphore
    return
}
You should understand these examples to be protected somehow from preemption, so that no other process could execute between the decrementing and testing of the semaphore value in the P() call, for instance.

If you consider semaphores carefully, you might decide that they are like mutexes, but they don't "lose" extra signals. This is a good way to look at them, but not the only way.

As you can see, the definition of counting semaphores is simple. This has its advantages. For one thing, on many hardware platforms, there are primitive instructions to make implementation easy and efficient. For another, there are no complications to confuse the programmer. As a result, with some care, solutions implemented with semaphores can have a clarity of purpose that makes the code clean and minimizes the chances for bugs to creep in. It is critical to understand, however, that the semaphore operations P() and V() must be performed automically. That is, the manipulation of the counters and the blocking and unblocking operations must be non-interruptable. Can you see why? If you can't immediately, you might spend a little more time since the question is an excelllent test question.

Types of Synchronization Problems

By now, you've see three types of synchronization mechanisms: It turns out that that these mechanisms are essentially equivalent in terms of their "power." The power of a a language primitive is usually measured by the number of different programming challenges that a particular primitive can address. In the case of these synchronization mechanisms, each one can be used to implement the others (with some assumptions about how variables are shared and the atomicity of memory read and write instructions).

Much of the research that went into the design of these primitives centered on how "elegantly" they solved different synchronization problems that appear to be common to many asynchronous systems. In addition to the bounded buffer problem, there are a few others.

Signalling

On common problem, sometimes termed signalling, that occurs frequently is one in which one thread must complete some portion of work before another thread is allowed to proceed. This synchronization pattern usually occurs as a result of thread-based modularity. That is, you often want to write threads to solve specific, self-contained tasks and you need threads doing different tasks to work together (perhaps in sequence) to solve some problem.

For example, consider the design of a web server. You probably want one kind of thread to be responsible for reading a request from the network, checking its validity, making sure it is from an IP address you recognize, etc. and a second kind of thread to be responsible to servicing the request. The request-checker thread must run before the servicer thread. To pull this off, you need to write your program to include synchronization that ensures the servicer thread will not try and service a request before a checker thread has checked it. Think of the situation as a bounded buffer problem where the buffer size is one. The request checker produces a request that the servicer must consume. A bit more generally, let's call the two threads A and B, and assume that the operation in thread A has to happen first. We can use a semaphore with an initial value of 0:
Initialization  sem = 0  
Thread code
a1 statement
a2 sem.V() 
 
b1 sem.P()
b2 statement 
Now if thread B reaches the sem.P() call before thread A has incremented the semaphore, thread B will set the semaphore to -1 and block. The eventual call to sem.V() by thread A will wake up thread B, so that it can continue, but by that time, statement a1 will already have been executed, and that's what we wanted to ensure. On the other hand, if thread A had gotten there first, it would have incremented the semaphore to +1, and thread B would not have to block at all.

You can download a PowerPoint animation.

Notice that the signal from A to B correctly implements the sematics you are hoping for, regardless of when A and B actually run. When you write threaded code, this type of reasoning is exactly what you must go through for each and every synchronization opportunity. Thread execution order cannot determine your outcome or you have a race condition (if the program is deterministic).

Rendezvous

The parallel processing folks have a concept of a barrier, which is a point that all threads have to reach before any can proceed. The two-thread simplified version is called a rendezvous. It's just a matter of signalling both ways. Here, we'll need two semaphores, both initialized to zero. But we'll have to be careful, or we'll wind up with a solution like this:
Initialization 
aArrived = 0
bArrived = 0
 
Thread code
a1 statement
a2 bArrived.P()
a3 aArrived.V()
a4 statement
 
b1 statement
b2 aArrived.P()
b3 bArrived.V()
b4 statement

What's wrong? By waiting before signalling we ensure neither process can proceed to the point signalling. This is a classic deadlock, and it's always going to happen, so it's not even a race condition. We can fix it by switching the order of the signal and wait calls in either thread A or thread B. Here, we'll switch statements b2 and b3.
Initialization 
aArrived = 0
bArrived = 0
 
Thread code
a1 statement
a2 bArrived.P()
a3 aArrived.V()
a4 statement
 
b1 statement
b2 bArrived.V()
b3 aArrived.P()
b4 statement

This is better. It may happen that thread A will block without signalling thread B, but thread B will eventually wake up thread A, so things will proceed. But this solution is still not the best. It can happen in a single-processor system that the processor swithces between the two threads more often than is strictly necessary. We really should reverse the order of signal and wait in both threads.
Initialization 
aArrived = 0
bArrived = 0
 
Thread code
a1 statement
a2 aArrived.V()
a3 bArrived.P()
a4 statement
 
b1 statement
b2 bArrived.V()
b3 aArrived.P()
b4 statement
You can download a PowerPoint animation.

Now we've got it right. We'll revisit this idea a bit later for the full-scale barrier problem.

Locks or Mutexes

So much for simple tasks. You should be seeing that semaphores can be useful, but might be wondering if they can really do the job of the mutexes we've been seeing in the other lectures. They can. Let's go back to the ATM problem that was used to introduce mutexes and mutual exclusion a few weeks ago.

In the previous two examples, the semaphore was initialized to zero, so that if the first operation on the semaphore was a P() call, the calling process would block. But for a mutex, we want the first process to proceed, but block any subsequent P() call until the first process uses V() to indicate that it is finished with the critical section of code. For that, we initialize the semaphore to one instead of zero. We'll call our semaphore mutex to show how we're using it, and indent the calculation to show it's contained within the critical section.
Initialization 
mutex = 1
 
Thread code
a1 mutex.P()
a2   wolski.balance =
          wolski.balance - 400
a3 mutex.V()
 
b1 mutex.P()
b2   wolski.balance =
          wolski.balance - 400
b3 mutex.V()
You can download a PowerPoint animation.

The first thread to reach the P() call will decrement the mutex to 0, but will proceed into the critical section. If the other thread arrives at the P() call before the first one leaves the critical section, it will decrement the mutex to -1 and block. This second thread will become unblocked when the first thread calls V(). This should all look very familiar, because this is exactly what our Pthread mutexes were doing.

The Bounded Buffer Problem

The last time we were looking at mutexes, the problem was a bit more complicated than this, though. We were doing the printer simulation, and we had several mutexes and condition variables. Can semaphores do that? They can indeed. We'll use two of the mutex patterns discussed above, one to protect the head of the queue and one to protect the tail. We'll avoid the condition variable entirely by using semaphores directly to keep track of how many empty queue slots are empty and how many are full. Accordingly we won't need to keep the count of the number of entries in the queue, at least not explicitly.

Here's what the code looks like, in the format that we were using in the printer simulation lecture notes, if you make allowances for the liberties I'm taking by pretending that semaphores are provided by the Pthreads package.


/* NOTE: this code is notional only; the Pthreads package does not include
 *       support for semaphores.  They are imagined here for purposes of
 *       presentation only.  This code would never compile, let alone run.
 */

#include < stdio.h >
#include < pthread.h >
#include "printqsim.h"

typedef struct {
  Job **jobs;
  int head;
  int tail;

  pthread_semaphore_t *headmutex;
  pthread_semaphore_t *tailmutex;
  pthread_semaphore_t *full;
  pthread_semaphore_t *empty;

} Buffer;
  
void initialize_state(SimParameters *p)
{
  Buffer *b;

  b = (Buffer *) malloc(sizeof(Buffer));
  b->jobs = (Job **) malloc(sizeof(Job *)*p->bufsize);
  b->head = 0;
  b->tail = 0;

  b->headmutex = (pthread_semaphore_t *) malloc(sizeof(pthread_semaphore_t));
  b->tailmutex = (pthread_semaphore_t *) malloc(sizeof(pthread_semaphore_t));
  b->full      = (pthread_semaphore_t *) malloc(sizeof(pthread_semaphore_t));
  b->empty     = (pthread_semaphore_t *) malloc(sizeof(pthread_semaphore_t));
  pthread_semaphore_init(b->headmutex, 1);
  pthread_semaphore_init(b->tailmutex, 1);
  pthread_semaphore_init(b->full,      0);
  pthread_semaphore_init(b->empty,     p->bufsize);

  p->state = (void *) b;
}

void submit_job(Agent *s, Job *j)
{
	SimParameters *p;
  	Buffer *b;

	/*
	 * get the sim parameters from the agent
	 */
	p = s->p;
	/*
	 * get the queue from the sim parameters
	 */
  	b = (Buffer *) p->state;


	/*
	 * wait until the job will fit
	 */
	pthread_semaphore_P(b->empty);

	/*
	 * insert it at the head; protect the head pointer
	 */

	pthread_semaphore_P(b->headmutex);

		b->jobs[b->head] = j;
		b->head = (b->head + 1) % p->bufsize;

	pthread_semaphore_V(b->headmutex);

	/*
	 * signal one additional slot has a job
	 */

	pthread_semaphore_V(b->full);

	return;
}


Job *get_print_job(Agent *s)
{
	SimParameters *p;
	Buffer *b;
  	Job *j;

	/*
	 * get the sim parameters
	 */
	p = s->p;
	/*
	 * get the buffer from the parameters
	 */
	b = (Buffer *)p->state;


	/*
	 * wait for work
	 */
	pthread_semaphore_P(b->full);

	/*
	 * get the one at the tail; protect the tail pointer
	 */

	pthread_semaphore_P(b->tailmutex);

		j = b->jobs[b->tail];
		b->tail = (b->tail + 1) % p->bufsize;

	pthread_semaphore_V(b->tailmutex);

	/*
	 * signal an additional slot is empty
	 */

	pthread_semaphore_V(b->empty);

	return j;
}

The pattern in general looks like this:
Initialization 
inputmutex  = 1
outputmutex = 1
fullslots   = 0
emptyslots  = queue capacity
 
Thread code
 // source threads
s1 emptyslots.P()
s2 inputmutex.P()
s3    add to queue
s4 inputmutex.V()
s5 fullslots.V()
 
 // consumer threads
c1 fullslots.P()
c2 outputmutex.P()
c3    remove from queue
c4 outputmutex.V()
c5 emptyslots.V()

You may notice that the while-loops, the if-then-else conditions and the condition variables that were present in the real Pthreads solution are gone from this solution, mostly because the semaphores take care of counting for us. Waiting on the b->full semaphore waits for a slot that has a job, as well as taking care to count the job as removed as soon as the process proceeds.

Another point to note about this solution is that user threads no longer share data with printer threads, and they don't use the same mutex semaphores. Accordingly, user threads and printer threads don't block each other except when the required resources (empty and full queue slots, respectively) are truly unavailable. Users insert jobs using the head pointer, and protect it from concurrent access with the headmutex mutex, but that only blocks other user threads. The printer threads only use the tail pointer, and protect it with the tailmutex mutex, but again that only blocks other printer threads. The count of jobs is kept implicitly in the empty and full semaphores, and the threads incrementing those semaphores never block because semaphore incrementation is a non-blocking call.

All these wonderful qualities of the solution do not relieve us of the responsibility to be careful about race conditions, however. You should convince yourself that this solution works. In doing that, it may be helpful to convince yourself of the following intermediate-level properties of the solution: the slots in the queue are filled and emptied in order, that each decrement of empty is matched by an increment of full and vice versa, the counts of full and empty slots never overstate the associated property of the queue, and temporarily understating these properties does no harm.

Rendezvous Revisited: the Barrier

I promised to get back to the Rendezvous, and here it is. The general problem here is to get a group of threads to a certain point in the overall computation, but have none of them proceed until they have all gotten there. This happens very frequently in algorithms that have stages where each stage has parts that can be assigned to separate threads, but the entire stage must be complete before the next stage is begun.

For this purpose, we'll need an integer to count threads, a mutex to protect the count, and a semaphore to use like the Signalling pattern we used in the first example. The signal will be given only when all the threads have arrived at the barrier. We just have to be careful, or we'll get it wrong, like this example:
Initialization
int       count         = threadcount
semaphore mutex         = 1
semaphore barriersignal = 0
Thread code
1 mutex.P()
2   count --
3 mutex.V()

4 if (count == 0):
5     barriersignal.V()

6 barriersignal.P()
7 barriersignal.V()

This code works but it has a few interesting properties. In particular, it is difficult to know what the value of the barriersignal semaphore will be after all threads have passed line 7. Do you see why?

Consider a case in which there are 3 threads and they each run to line 6 without interruption. Thread_0 will decrement the count to 2, test count and find it to be non-zero, and call barriersignal.P(). Because the barrier is initialized to 0 Thread_0 blocks on the semaphore. Next Thread_1 runs and the exact same thing happens -- it blocks at line 6. At this point in time, the value of count is 1. Finbally, Thread_2 runs, decrements count to 0 and then hits the test at line 4. Seeing the count 0 it issues a barriersignal.V(). At this point, just before the call, there are two threads blocked on the semaphore at line 6. One of them is awakened so it can proceed to line 7 and call barriersignal.V() to wake up the other thread. That thread hits line 7 and stores a wake up so that when Thread_2 hits line 6 it will immediately unblock and proceed to line 7. At the end of this sequence no thread is allow to get past line 6 before the last thread gets to line 5 hence it is a barrier. Further, in this example, the value of barriersignal.V() when the last thread passes line 7 is 1.

Good so far? Okay, now consider a different execution order that involves preemption. Let's say that Thread_0 gets preempted between lines 3 and 4 with the count reading 2. Similarly, Thread_1 gets preempted in the same spot, with the count reading 1. Now let's say that Thread_2 makes it through line 4 and line 5. The count is 0 so it issues barriersignal.V(). Fine. Only what happens if it gets preempted between line 5 and line 6 and Thread_0 is un-preempted? It tests count, finds it 0 and called barriersignal.V(). Say that it too is preempted between lines 5 and 6 and Thread_1 runs and has the same thing happen. At this point, all three threads have read the count as 0 and called barriersignal.V(). They will all eventually hit line 6 and because the value of the semaphore is 3 they will all proceed, but they will also hit line 7 leaving the value 3 when the sequence is over.

Notice that, as a result, you can't call this sequence in a loop. Why? Because the value of the semaphore will be positive at the end but it needs to be 0 at the beginning. Moreover, it is not possible to check the value of the semaphore after line 7 before the next loop trip. However as long as the number of calls to barriersignal.V() is greater than or equal to the number of calls to barriersignal.P(), this code implements a barrier and there is no execution sequence for which this condition is false.

The sequence of a P() and V() in quick succession like this is called a turnstile because it lets one thread through at a time. It's like an empty mutex, just used for traffic control. Each thread, except the last one to decrement global_count, will call P(). After each thread returns from a P() operation it will call V() so that another thread can complete its P(). The last thread kicks off this sequence, though, by calling V() indicating that the first -- uh -- P()ing thread can proceed.

Think about how you might write a turnstile type barrier so that it could work in a loop.

Binary semaphores

There are variants of the semaphores described above, and one is a simplification called binary semaphores. A binary semaphore is much like a normal semaphore except that the integer can only assume the values of 0 and 1. They are usually implemented so that attempting to lock a semaphore whose value is zero simply blocks until the value is 1, then they unblock and set it zero. This means that like the Pthreads mutex construct, signals to an unlocked binary semaphore are "lost", but that multiple threads can be waiting on the semaphore without getting confused. If these binary semaphores are available, they are sufficient to construct the semaphores described here. You may encounter such situations in your future careers, especially in embedded systems, where the underlying hardware capability may be exposed as binary semaphores.

SYSV IPC Semaphores

Just about all modern UNIX variants include support of System V Interprocess Communications (SYSV IPC) primitives. These include shared memory, message passing, and semaphores. You can look up the semaphores on the man pages for semget, semctl and semop. They are different from the semaphores discussed here mostly in the additional functionality included.

For one thing, the SYSV IPC semaphores are created in groups, and you can operate on more than one at a time atomically. This means that you can release (signal) one or more semaphores at the same time that you get (wait for) one or more others, and that none of this happens unless it all happens. You can do the same thing with the semaphores we have been disussing, but it is complicated to do it right.

For another thing, you can wait or signal by more than a +1 or -1 at a time. This is not common, but when you need it (for instance, to obtain or release more than one of a given resource), this feature is very handy.

Moreover, you are allowed to obtain the current value of the semaphore so you can decide whether your thread wishes to perform an operation. For example, you could decide to V() a semaphore only when enough threads have blocked. This feature is often very handy.

Finally, you can use semaphores in a non-blocking mode, getting an error condition back in the cases when the semaphoore would block.