CS170 Lecture notes -- Kthreads


The Kthreads Library

Kthreads is a relatively simple, non-preemptive threads library that we will be using to implement our operating system later in the labs. For our purposes, it has an advantage over pthreads because it works in the debugger and because it will not interfere with the simulator (making your development life easier). It's also simple enough to understand, and therefore makes for a good teaching tool.

The kthreads library is at /cs/faculty/rich/cs170/lib/libkt.a, the source is at /cs/faculty/rich/cs170/src/libkt/, and the header is at /cs/faculty/rich/cs170/include/kt.h. You will also need to link to /cs/faculty/rich/cs170/lib/libfdr.a which provides some basic C functions like linked lists and red-black trees.


The Interface

Using kthreads is pretty straightforward, so we will not go into it in much detail. You should all understand the basic thread primitives by now although you need to remember that (unlike in the case of pthreads) kthreads do not pre-empt each other. The kthreads calls are:

Kthreads includes counting semaphores as its only synchronization primitive: The basic kt primitives (fork, exit, join) play the same role that pthread_create, pthread_exit, and pthread_join play for pthreads. Of the other functions, kt_yield() interrupts the current thread and lets the scheduler run a new one. This primitive is nice in non-pre-emptive thread systems because it allows a kind of "polling" of the scheduler. A thread calling kt_yield() blocks itself and allows other threads that can run to go ahead. When no more runnable threads are available, the yielding thread will be resumed at the point of the yield.

The call kt_sleep() sleeps the current thread for a specified time period. Again, because the thread is non-pre-emptive, the thread will be "awakened" and made runnable after the specified time, but it will not actually run until it is given the CPU.

The call kt_self() returns the thread id. No confusion here.

The function kt_joinall() is a useful function that causes the current thread to block until all other threads have either exited or blocked on a semaphore.

You will find that this function is particularly handy in designing your OS.

The semaphore primitives are exactly as we discussed. make_kt_sem() creates a semaphore with a value greater than or equal to zero, and kill_kt_sem() destroys (and frees) it. P_kt_sem() decrements the semaphore's value by 1, and if it is negative blocks it. V_kt_sem() increments the value by one, and if it is zero or less it unblocks one thread.

There is also a call to interrogate the current value of the semaphore -- kt_getval(). While not strictly part of the semaphore API, there are occasions where the ability to know how many threads are blocked on a semaphore is quite handy.


Example

Here is a hopefully familiar example of how to use this library using the Client-trader Simulation from previous classes. Here is a version of market-kthreads.c that uses the kthreads library. There are a couple things worth noting here. First, notice that it doesn't bother protecting against any of the race conditions (the mutexes are gone) that the pthreads version does. That is, there is a distinct lack of calls to any primitive implementing mutual exclusion. This is because we know kthreads is strictly non-preemptive, and so there are no race conditions between running threads. Don't be confused, however. In the OS you build, you can create race conditions -- it just won't be between runnable threads. We'll discuss this at length later. For now it is enough for you to know that race conditions are possible in your labs, but not in this example. However, with that difference and small differences in the return values when threads are created and destroyed, the code is the same as the semaphore version of the client-trader simulation. As we saw before, this is a pretty elegant solution to the problem. Run it a few times, however, and you'll see that there is no speed-up gained through the use of multiple threads. Why? Kthreads doesn't use multiple processors -- it is strictly for decomposing a problem into thread like tasks and as this example shows converting something from kthreads to pthreads and back again is pretty simple.

This single-threaded implementation comes in VERY handy when you will be developing your OS. Trust me.

Understanding kt_joinall()

One curious primitive in the interface is kt_joinall() which bears some discussion. For reasons that will become clear when you begin working on your OS, it is sometimes convenient to have a way for a "master" thread to block and wait until there is nothing else that can run. That is, imagine you wanted to create a "watcher" thread that waits around for all of the other threads to do their work and, when they have finished or they are all blocked waiting for something, this "watcher" thread wakes up and takes some action.

Don't ask why just yet -- just imagine it to be true.

You could try to pull this trick off using kt_yield() where each thread would yield right before any call to P_kt_sem() and the watcher thread were in a loop calling yield, but it would be tricky to make sure you don't deadlock, and the watcher thread is essentially spinning and burning CPU.

Instead, kthreads includes the kt_joinall() command. It has the following properties:

You can also think of this as a way of setting a place in the code where you wish to continue once all other work has been finished.

It may sound confusing. The code in joinall-1.c attempts to illustrate how it works. In it, the thread simply increments a shared counter (a pointer to which is passed as an argument) in a loop and does so under the control of a semaphore. The main thread (which will be the master in this example) waits until all threads have finished and then exits. It does so in a loop where it prints out the progress of the threads and calls kt_joinall():


        /*
         * loop showing progress
         */
        while(done_threads < threads) {
                printf("shared counter: %f\n",counter);
                fflush(stdout);
                kt_joinall(); /* gets here when all else is stopped */
        }


What should the output of this program be? Here is the execution of the program with two worker threads, each one having to make two trips (as indicated by the -C argument). With two worker threads the final count should be 4 since each thread runs the count up by two.
./joinall-1 -t 2 -C 2
shared counter: 0.000000
final count: 4.000000
Is this what you expected? Try it with the -V flag:
./joinall-1 -t 2 -C 2 -V
shared counter: 0.000000
thread 0: incremented shared counter to 1.000000
thread 0: incremented shared counter to 2.000000
thread 1: incremented shared counter to 3.000000
thread 1: incremented shared counter to 4.000000
final count: 4.000000
Notice that both threads run and do their increments. However the master thread seems not to go around the progress loop more than once.

This execution sequence is correct. The master thread spawns both worker threads and calls V on the semaphore to make sure one starts (the initial semaphore value is 0). It then enters the loop because the threads aren't done, prints the "shared counter:" message, and calls kt_joinall() at which point it blocks waiting for all threads to either exit or block.

The two worker threads then run (first thread 0 and then thread 1), finish their work, and exit. The kt_joinall() command then completes, the master loops around and tests, finds them all completed and exits.

Now take a look at joinall-2.c. Notice that there is no call to V_kt_sem() in either thread. That is, each thread blocks in a call to P_kt_sem() but they never unblock another thread.

Instead, the master thread calls V_kt_sem() in its progress loop as well as kt_joinall():

        /*
         * loop showing progress
         */
        while(done_threads < threads) {
                V_kt_sem(sema); /* enable a thread */
                printf("shared counter: %f\n",counter);
                fflush(stdout);
                kt_joinall(); /* gets here when all else is stopped */
        }

Run this code and you'll see
./joinall-2 -t 2 -C 2
shared counter: 0.000000
shared counter: 1.000000
shared counter: 2.000000
shared counter: 3.000000
final count: 4.000000
and with the verbose flag set
/joinall-2 -t 2 -C 2 -V
shared counter: 0.000000
thread 0: incremented shared counter to 1.000000
shared counter: 1.000000
thread 0: incremented shared counter to 2.000000
shared counter: 2.000000
thread 1: incremented shared counter to 3.000000
shared counter: 3.000000
thread 1: incremented shared counter to 4.000000
final count: 4.000000
What happens in this case is that the master thread calls V to enable some thread blocked on the semaphore, prints its progress message, and blocks in kt_joinall(). Whichever thread was awakened (thread 0 in this case) runs, increments, and then blocks again in its call to P at which point there are no other runnable threads. Take a moment to understand that last sentence. After thread 0 in this example runs for the first time, it calls P again. No other thread has called V and the master thread is blocked so all threads are blocked. Because the master thread is blocked in kt_joinall(), the call unblocks at this moment, the master runs, tests to see if all threads have completed, calls V to enable another thread, and calls kt_joinall() again.

Thus, kt_joinall() specifies a place to continue when all other threads have blocked or have exited which will come in handy at some point in your future.


Implementation

We will be discussing the logic that the kthreads library uses to make all of its calls, and also discuss some of the more important concepts. I regret that I will not have the time to go into the nitty gritty details on how this library works. It is relatively simple and relatively short, and I urge you all to do this on your own. It should not be difficult to relate the source code to the ideas I will discuss. Also, Dr. Plank and Dr. Beck at UT have a very detailed lecture on kthreads implementation here. To understand this explanation in depth, you will need to understand the Linux calls setjmp() and longjmp(). If you don't understand the man pages now, at some point before this class ends read them and you will certainly be able to see how they work.

Global Data

Kthreads uses a few pieces of global data to keep track of the current process. Let's go over those first:

Thread Structure

Also, there are some variables associated with each thread:

Semaphore structure

The Scheduler

The first thing to understand is that the scheduler is really managing a set of queues, each one of which contains jobs in a certain state. The implementation does not necessarily use a linked list for each of these queues, but logically, you can think of them as queues. They are: There is also a global pointer to the currently running thread.

The scheduler, called KtSched(), is the core of the threads system. It is called whenever a thread is abdicating the CPU, and its job is to manage these queues and set the next runnable thread. Since this system is non-premptive, threads will only abdicate the CPU on their own initative. The calls that do this are kt_join(), kt_exit(), kt_sleep(), kt_yield(), kt_joinall(), and P_kt_sem(). When a thread abdicates the CPU, KtSched() takes the thread that is at the head of the Run Queue and makes it the running thread. It sets the global pointer and switches from the current thread (which is the one that is abdicating) to the new "running" thread. The scheduler does not return until there is a thread that can be run or when there are no more threads in the system. We'll discuss this behavior in greater detail as we discuss the other primitives. The important thing to know about the scheduler, however, is that its job is to switch from the currently running thread to the next runnable thread, and to manage the internal queues.

The Kthreads functions

The first function that we discuss is kt_yield() since it is now easy to understand. When a thread calls kt_yield() is simply adds itself to the end of the Run Queue and calls ktSched(). Here is the source code for kt_yield().

void kt_yield()
{
        InitKThreadSystem();

        ktRunning->state = RUNNABLE;
        dll_append(ktRunnable,new_jval_v(ktRunning));
        KtSched();
        return;
}


Simple, huh? The call to InitKThreadSystem() is there just to handle the case when kt_yield() is the first thread call made by a thread. This idempotent call prevents us from having to make an explicit initialization call in a KThread code. A caller of kt_yield() sets its state to RUNNABLE (it will no longer be RUNNING), appends itself to the end of the Run Queue, and calls the scheduler. If there are other runnable threads, they will each be run in turn and then, when this thread's turn comes up, it will be run again.

The next function we will cover is kt_fork(). It is going to create a stack and a state for the thread, which we will discuss later. It will then set the func and arg fields to their approprate values, choose a unique id for the thread, set its state to RUNNABLE, and add it to the end of ktRunning.

The easiest function is kt_self(). It simply returns its unique thread id of ktRunning typecast to a (void *). The only reason it returns a (void *) is to hide some of the inner workings of the library from the user.

kt_sleep(int sec) is pretty easy too. For this, you set the state to SLEEPING, calculate the wakeup time (time(NULL) + sec), and insert it into ktSleeping keyed on its wakeup time.

kt_join(void *ktid) is similar. First off, check the thread at ktid exists. If it doesn't, then we figure that it has exited and we simply return. Technically, this allows a caller to join with a thread id that has never existed, but in order to keep track, we'd have to have a list of all valid thread ids ever used. We'll leave this as a subtle point.

Next we need to see if there is a thread in ktBlocked keyed on the joinee's id. Each thread can only have one other thread waiting to join with it. Think about that for a minute. Thread A tries to join with Thread B and later Thread C tries to join with Thread B. What do you want to have happen? Your options are

Kthreads takes this latter approach and exits your program.

If we make it this far, then we set ktid's joining field to point to ourself, set our state to BLOCKED, and add ourself to the blocked tree keyed on ktid's id.

With this in mind, kt_joinall() is pretty simple. We do the same as kt_join(), but we treat it is if we were joining with a thread with id 0 which we will never assign internally. The scheduler takes one last look at the Blocked Queue before it decides to exit, and if it sees a thread trying to join with 0, it wakes that thread as the joinall thread. Again, at most one thread can call kt_joinall(). Multiple calls cause the program to exit.

Last, kt_exit() is going to free up all of the data for the thread and simply run the scheduler again without putting the current thread back in the list. I am glossing over the details of this because it is easier said than done. The only thing that it needs to do is check to see if it has a joiner. It checks ktBlocked to see if anyone is blocked on its id. If there is, we remove it, set the state to RUNNABLE, and append it to ktRunnable.

Semaphore functions

Our semaphores are going to work similar to the kt_join() and kt_joinall() code. When they are created with the make_kt_sem() call, our semaphores are going to get their own unique ids. These are not going to overlap with the kthread ids because we will block exactly the same way as above. We will also set the semaphore up with an initial value when we create it.

The function P_kt_sem() will decrement the value of the semaphore, and if this value is less than zero, it will block the thread. To do this, it will set its state to BLOCKED and insert it into ktBlocked keyed on the id of the semaphore. Here is the code:

void P_kt_sem(kt_sem iks){
        Ksem ks = (Ksem)iks;
        K_t me = ktRunning;

        InitKThreadSystem();
        ks->val--;

        if(ks->val < 0)
        {
                /*
                 * use the semaphore tid as the blocking key
                 */
                ktRunning->ks = ks;
                BlockKThread(ktRunning,ks->sid);
                KtSched();
                ktRunning->ks = NULL;
                return;
        }

        return;
}
Again -- the code is fairly simple once you understand that KtSched() is doing all of the hard work associated with switching between threads.

V_kt_sem() increments the counter on the semaphore and checks to see if the value is less than or equal to zero. If it is, it searches ktBlocked for a thread keyed on its id, sets its state to RUNNABLE, and appends it to the end of ktRunnable. Notice that there is no guarantee that the threads are going to be unblocked in FIFO order. It would be hard to do, but we couldn't keep up the nice, generic system we have. So it goes.

Here is the code:

void V_kt_sem(kt_sem iks)
{
        Ksem ks = (Ksem)iks;
        K_t wake_kt;

        InitKThreadSystem();

        ks->val++;

        if(ks->val <= 0)
        {
                wake_kt = jval_v(jrb_val(jrb_find_int(ktBlocked,ks->sid)));
                WakeKThread(wake_kt);
        }

        return;
}
Notice that it picks some thread off the list of threads blocked on the semaphore and wakes it up. The wake code is here:
void
WakeKThread(K_t kt)
{
        /*
         * look through the various blocked lists and try to wake the
         * specified thread
         */

        if (kt->state == RUNNING || kt->state == RUNNABLE
                                 || kt->state == DEAD) return;

        jrb_delete_node(kt->blocked_list_ptr);
        kt->state = RUNNABLE;
        kt->blocked_list = NULL;
        kt->blocked_list_ptr = NULL;
        dll_append(ktRunnable,new_jval_v(kt));
        return;
}

Last is kill_kt_sem(). There is not much to say about this except that it checks to see if there are any threads blocked on the semaphore; if there are, its flags an error and exits.

The scheduler revisited

So now that we know how the functions work I think we are in a better position to discuss the scheduler. It basically goes through a set of steps before it takes the next thread and runs it. Here is what it does:
  1. Check ktSleeping to see if there are any threads that are ready to wake up. If there are, take them out of ktSleeping, set their state to RUNNABLE, and add them to the end of ktRunnable
  2. If there are threads in ktRunnable, take the first one off the list and run it. At this point, the scheduler returns (actually it stops and never returns)
  3. If there are threads in ktSleeping, take the next thread to wakeup and sleep until it is time for it to run again. Wake up and run the thread as step 2
  4. If there is a joinall thread, make it runnable and run it as in step 2
So that's the scheduler, and now we understand how how kthreads works in general. Its surprisingly simple, isn't it? The only "hard" part we haven't discussed is what a thread actually is made of, and the scheduler switches between them. You need to understand the Unix calls setjmp() and longjmp() very clearly before we can address these questions. Unfortunately, we won't do so explicitly, but by the time you've implemented processes in your OS, you'll know enough to be able to work through the man pages and the inner working of ktSched().