The concept of semaphores as used in computer synchronization is due to the Dutch computer scientist Edsgar Dijkstra. They have the advantages of being very simple, but sufficient to construct just about any other synchronization function you would care to have; we will cover a few of them here. There are several versions of the semaphore idea in common use, and you may run into variants from time to time. The end of these notes briefly describe two of the most common, binary semaphores and the SYSV IPC semaphores.
A semaphore is an integer with a difference. Well, actually a few differences.
There are various ways that these operations are named and described, more or less interchangeably. This can be confusing, but such things happen in computer science when we try to use metaphors, especially multiple metaphors, to describe what a program is doing. Here are some:
typedef struct sem { int value; other_stuff } *Sem;There are two actions defined on semaphores: P(Sem s) and V(Sem s). (The book calls them P() and V()). P and V are the first letters of two Dutch words proberen (to test) and verhogen (to increment) which, on balance, makes about as much (or as little) sense as any other set of monikers. The inventor of semaphores was Edsger Dijkstra who was very Dutch.
initialize(i) { s->value = i return } P(Sem s) { s->value--; if(s->value < 0) block on semaphore return } V(s) { s->value++; if(s->value <= 0) unblock one process or thread that is blocked on semaphore return }You should understand these examples to be protected somehow from preemption, so that no other process could execute between the decrementing and testing of the semaphore value in the P() call, for instance.
If you consider semaphores carefully, you might decide that they are like mutexes, but they don't "lose" extra signals. This is a good way to look at them, but not the only way.
As you can see, the definition of counting semaphores is simple. This has its advantages. For one thing, on many hardware platforms, there are primitive instructions to make implementation easy and efficient. For another, there are no complications to confuse the programmer. As a result, with some care, solutions implemented with semaphores can have a clarity of purpose that makes the code clean and minimizes the chances for bugs to creep in. It is critical to understand, however, that the semaphore operations P() and V() must be performed automically. That is, the manipulation of the counters and the blocking and unblocking operations must be non-interruptable. Can you see why? If you can't immediately, you might spend a little more time since the question is an excelllent test question.
Much of the research that went into the design of these primitives centered on how "elegantly" they solved different synchronization problems that appear to be common to many asynchronous systems. In addition to the bounded buffer problem, there are a few others.
For example, consider the design of a web server. You probably want one kind of thread to be responsible for reading a request from the network, checking its validity, making sure it is from an IP address you recognize, etc. and a second kind of thread to be responsible to servicing the request. The request-checker thread must run before the servicer thread. To pull this off, you need to write your program to include synchronization that ensures the servicer thread will not try and service a request before a checker thread has checked it. Think of the situation as a bounded buffer problem where the buffer size is one. The request checker produces a request that the servicer must consume. A bit more generally, let's call the two threads A and B, and assume that the operation in thread A has to happen first. We can use a semaphore with an initial value of 0:
Initialization | sem = 0 | ||
Thread code | a1 statement a2 sem.V() | b1 sem.P() b2 statement |
You can download a PowerPoint animation.
Notice that the signal from A to B correctly implements the sematics you are hoping for, regardless of when A and B actually run. When you write threaded code, this type of reasoning is exactly what you must go through for each and every synchronization opportunity. Thread execution order cannot determine your outcome or you have a race condition (if the program is deterministic).
Initialization | aArrived = 0 bArrived = 0 | ||
Thread code | a1 statement a2 bArrived.P() a3 aArrived.V() a4 statement | b1 statement b2 aArrived.P() b3 bArrived.V() b4 statement |
What's wrong? By waiting before signalling we ensure neither process can proceed to the point signalling. This is a classic deadlock, and it's always going to happen, so it's not even a race condition. We can fix it by switching the order of the signal and wait calls in either thread A or thread B. Here, we'll switch statements b2 and b3.
Initialization | aArrived = 0 bArrived = 0 | ||
Thread code | a1 statement a2 bArrived.P() a3 aArrived.V() a4 statement | b1 statement b2 bArrived.V() b3 aArrived.P() b4 statement |
This is better. It may happen that thread A will block without signalling thread B, but thread B will eventually wake up thread A, so things will proceed. But this solution is still not the best. It can happen in a single-processor system that the processor swithces between the two threads more often than is strictly necessary. We really should reverse the order of signal and wait in both threads.
Initialization | aArrived = 0 bArrived = 0 | ||
Thread code | a1 statement a2 aArrived.V() a3 bArrived.P() a4 statement | b1 statement b2 bArrived.V() b3 aArrived.P() b4 statement |
Now we've got it right. We'll revisit this idea a bit later for the full-scale barrier problem.
In the previous two examples, the semaphore was initialized to zero, so that if the first operation on the semaphore was a P() call, the calling process would block. But for a mutex, we want the first process to proceed, but block any subsequent P() call until the first process uses V() to indicate that it is finished with the critical section of code. For that, we initialize the semaphore to one instead of zero. We'll call our semaphore mutex to show how we're using it, and indent the calculation to show it's contained within the critical section.
Initialization | mutex = 1 | ||
Thread code | a1 mutex.P() a2 wolski.balance = wolski.balance - 400 a3 mutex.V() | b1 mutex.P() b2 wolski.balance = wolski.balance - 400 b3 mutex.V() |
The first thread to reach the P() call will decrement the mutex to 0, but will proceed into the critical section. If the other thread arrives at the P() call before the first one leaves the critical section, it will decrement the mutex to -1 and block. This second thread will become unblocked when the first thread calls V(). This should all look very familiar, because this is exactly what our Pthread mutexes were doing.
Here's what the code looks like, in the format that we were using in the printer simulation lecture notes, if you make allowances for the liberties I'm taking by pretending that semaphores are provided by the Pthreads package.
/* NOTE: this code is notional only; the Pthreads package does not include * support for semaphores. They are imagined here for purposes of * presentation only. This code would never compile, let alone run. */ #include < stdio.h > #include < pthread.h > #include "printqsim.h" typedef struct { Job **jobs; int head; int tail; pthread_semaphore_t *headmutex; pthread_semaphore_t *tailmutex; pthread_semaphore_t *full; pthread_semaphore_t *empty; } Buffer; void initialize_state(SimParameters *p) { Buffer *b; b = (Buffer *) malloc(sizeof(Buffer)); b->jobs = (Job **) malloc(sizeof(Job *)*p->bufsize); b->head = 0; b->tail = 0; b->headmutex = (pthread_semaphore_t *) malloc(sizeof(pthread_semaphore_t)); b->tailmutex = (pthread_semaphore_t *) malloc(sizeof(pthread_semaphore_t)); b->full = (pthread_semaphore_t *) malloc(sizeof(pthread_semaphore_t)); b->empty = (pthread_semaphore_t *) malloc(sizeof(pthread_semaphore_t)); pthread_semaphore_init(b->headmutex, 1); pthread_semaphore_init(b->tailmutex, 1); pthread_semaphore_init(b->full, 0); pthread_semaphore_init(b->empty, p->bufsize); p->state = (void *) b; } void submit_job(Agent *s, Job *j) { SimParameters *p; Buffer *b; /* * get the sim parameters from the agent */ p = s->p; /* * get the queue from the sim parameters */ b = (Buffer *) p->state; /* * wait until the job will fit */ pthread_semaphore_P(b->empty); /* * insert it at the head; protect the head pointer */ pthread_semaphore_P(b->headmutex); b->jobs[b->head] = j; b->head = (b->head + 1) % p->bufsize; pthread_semaphore_V(b->headmutex); /* * signal one additional slot has a job */ pthread_semaphore_V(b->full); return; } Job *get_print_job(Agent *s) { SimParameters *p; Buffer *b; Job *j; /* * get the sim parameters */ p = s->p; /* * get the buffer from the parameters */ b = (Buffer *)p->state; /* * wait for work */ pthread_semaphore_P(b->full); /* * get the one at the tail; protect the tail pointer */ pthread_semaphore_P(b->tailmutex); j = b->jobs[b->tail]; b->tail = (b->tail + 1) % p->bufsize; pthread_semaphore_V(b->tailmutex); /* * signal an additional slot is empty */ pthread_semaphore_V(b->empty); return j; }
The pattern in general looks like this:
Initialization | inputmutex = 1 outputmutex = 1 fullslots = 0 emptyslots = queue capacity | ||
Thread code | // source threads s1 emptyslots.P() s2 inputmutex.P() s3 add to queue s4 inputmutex.V() s5 fullslots.V() | // consumer threads c1 fullslots.P() c2 outputmutex.P() c3 remove from queue c4 outputmutex.V() c5 emptyslots.V() |
You may notice that the while-loops, the if-then-else conditions and the condition variables that were present in the real Pthreads solution are gone from this solution, mostly because the semaphores take care of counting for us. Waiting on the b->full semaphore waits for a slot that has a job, as well as taking care to count the job as removed as soon as the process proceeds.
Another point to note about this solution is that user threads no longer share data with printer threads, and they don't use the same mutex semaphores. Accordingly, user threads and printer threads don't block each other except when the required resources (empty and full queue slots, respectively) are truly unavailable. Users insert jobs using the head pointer, and protect it from concurrent access with the headmutex mutex, but that only blocks other user threads. The printer threads only use the tail pointer, and protect it with the tailmutex mutex, but again that only blocks other printer threads. The count of jobs is kept implicitly in the empty and full semaphores, and the threads incrementing those semaphores never block because semaphore incrementation is a non-blocking call.
All these wonderful qualities of the solution do not relieve us of the responsibility to be careful about race conditions, however. You should convince yourself that this solution works. In doing that, it may be helpful to convince yourself of the following intermediate-level properties of the solution: the slots in the queue are filled and emptied in order, that each decrement of empty is matched by an increment of full and vice versa, the counts of full and empty slots never overstate the associated property of the queue, and temporarily understating these properties does no harm.
For this purpose, we'll need an integer to count threads, a mutex to protect the count, and a semaphore to use like the Signalling pattern we used in the first example. The signal will be given only when all the threads have arrived at the barrier. We just have to be careful, or we'll get it wrong, like this example:
Initialization | int count = threadcount semaphore mutex = 1 semaphore barriersignal = 0 |
Thread code | 1 mutex.P() 2 count -- 3 mutex.V() 4 if (count == 0): 5 barriersignal.V() 6 barriersignal.P() 7 barriersignal.V() |
This code works but it has a few interesting properties. In particular, it is difficult to know what the value of the barriersignal semaphore will be after all threads have passed line 7. Do you see why?
Consider a case in which there are 3 threads and they each run to line 6 without interruption. Thread_0 will decrement the count to 2, test count and find it to be non-zero, and call barriersignal.P(). Because the barrier is initialized to 0 Thread_0 blocks on the semaphore. Next Thread_1 runs and the exact same thing happens -- it blocks at line 6. At this point in time, the value of count is 1. Finbally, Thread_2 runs, decrements count to 0 and then hits the test at line 4. Seeing the count 0 it issues a barriersignal.V(). At this point, just before the call, there are two threads blocked on the semaphore at line 6. One of them is awakened so it can proceed to line 7 and call barriersignal.V() to wake up the other thread. That thread hits line 7 and stores a wake up so that when Thread_2 hits line 6 it will immediately unblock and proceed to line 7. At the end of this sequence no thread is allow to get past line 6 before the last thread gets to line 5 hence it is a barrier. Further, in this example, the value of barriersignal.V() when the last thread passes line 7 is 1.
Good so far? Okay, now consider a different execution order that involves preemption. Let's say that Thread_0 gets preempted between lines 3 and 4 with the count reading 2. Similarly, Thread_1 gets preempted in the same spot, with the count reading 1. Now let's say that Thread_2 makes it through line 4 and line 5. The count is 0 so it issues barriersignal.V(). Fine. Only what happens if it gets preempted between line 5 and line 6 and Thread_0 is un-preempted? It tests count, finds it 0 and called barriersignal.V(). Say that it too is preempted between lines 5 and 6 and Thread_1 runs and has the same thing happen. At this point, all three threads have read the count as 0 and called barriersignal.V(). They will all eventually hit line 6 and because the value of the semaphore is 3 they will all proceed, but they will also hit line 7 leaving the value 3 when the sequence is over.
Notice that, as a result, you can't call this sequence in a loop. Why? Because the value of the semaphore will be positive at the end but it needs to be 0 at the beginning. Moreover, it is not possible to check the value of the semaphore after line 7 before the next loop trip. However as long as the number of calls to barriersignal.V() is greater than or equal to the number of calls to barriersignal.P(), this code implements a barrier and there is no execution sequence for which this condition is false.
The sequence of a P() and V() in quick succession like this is called a turnstile because it lets one thread through at a time. It's like an empty mutex, just used for traffic control. Each thread, except the last one to decrement global_count, will call P(). After each thread returns from a P() operation it will call V() so that another thread can complete its P(). The last thread kicks off this sequence, though, by calling V() indicating that the first -- uh -- P()ing thread can proceed.
Think about how you might write a turnstile type barrier so that it could work in a loop.
For one thing, the SYSV IPC semaphores are created in groups, and you can operate on more than one at a time atomically. This means that you can release (signal) one or more semaphores at the same time that you get (wait for) one or more others, and that none of this happens unless it all happens. You can do the same thing with the semaphores we have been disussing, but it is complicated to do it right.
For another thing, you can wait or signal by more than a +1 or -1 at a time. This is not common, but when you need it (for instance, to obtain or release more than one of a given resource), this feature is very handy.
Moreover, you are allowed to obtain the current value of the semaphore so you can decide whether your thread wishes to perform an operation. For example, you could decide to V() a semaphore only when enough threads have blocked. This feature is often very handy.
Finally, you can use semaphores in a non-blocking mode, getting an error condition back in the cases when the semaphoore would block.