Threads are often called "lightweight processes". Whereas a typical process in Unix consists of CPU state (i.e. registers), memory (code, globals, heap and stack), and OS info (such as open files, a process ID, etc), in a thread system there is a larger entity, called a "task", a "pod", or sometimes a "heavyweight process."
Tasks, programs, threads We will alter this definition slightly. For our work, a "task" or "program" will refer to a collection of resources (registers, memory space, file descriptors, network connections, etc.) A thread (or a set of threads) will refer to the active execution (a moving program counter and local variables) through a program. Under these definitions, then, a Unix process is a "task" (the resources defined by the process charged to a user ID) with a single "thread" of execution within it.
virtual parallel processing When you program with multiple threads explicitly, you assume that they execute simultaneously within their task. In other words, it should appear to you as if each thread is executing on its own CPU, and that all the threads share the same memory, network connections, file descriptors, disk storage, files etc.
program: a list of instructions that direct the machine to perform a desired computation.
state: a set of values contained in a specified set of variables (memory locations).
process: the state associated with a running program.
thread: a fundamental unit of computation
consisting of:
Using these definitions, a program becomes a process when it is initiated.
It may contain one or more threads, each characterized by an individual
program counter and local state variables, all accessing a share global set of
state variables.
This says to create a new thread which runs the given procedure with the given arguments. Sometimes the arguments are omitted, and sometimes only one argument (a (void *)) is allowed. It returns a pointer to the new thread (which I'll call a thread control block or TCB).
This says to wait for the thread represented by tcb to finish executing. Often thread_join() returns an integer or a (void *) as its exit value. You can think of thread_join() as analogous to wait() in Unix --- it waits for the specified thread to complete, and gathers information about the thread's exit status.
To make use of Posix threads in your program, you need to have the following include directive:
#include < pthread.h >And you have to link libpthread.a to your object files. The tricky part is that some Unix and Linux systems build libpthread.a into the standard C library. The easiest way to make sure you get what you are paying for is to use the -lpthread build option.
UNIX> gcc -c main.c UNIX> gcc -o main main.o -lpthread
There's a lot of junk in the pthread library. You can read about it in the various man pages. Start with ``man pthreads''. The two basic primitives defined above are the following in Posix threads:
int pthread_create(pthread_t *new_thread_ID, const pthread_attr_t *attr, void * (*start_func)(void *), void *arg); int pthread_join(pthread_t target_thread, void **status);This isn't too bad, and not too far off from my generic description above. Instead of returning a pointer to a thread control block, pthread_create() has you pass the address of one, and it fills it in. Don't worry about the attr argument -- just use NULL. Then func is the function, and arg is the argument to the function, which is a (void *). When pthread_create returns, the TCB (which uniquely identifies the created thread) is in *new_thread_ID, and the new thread is running func(arg).
pthread_join() has you specify a thread, and give a pointer to a (void *). When the specified thread exits, the pthread_join() call will return, and *status will be the return or exit value of a thread.
In all the Posix threads, calls, in integer is returned. If zero, everything went ok. Otherwise, an error has occurred. As with system calls, it is always good to check the return values of these calls to see if there has been an error. In my code here in the lecture notes, I'll omit error checking, but it is in the files, and you should do it.
How does a thread exit? By calling return or pthread_exit().
Ok, so check out the following program (in hw.c):
/* * hw.c -- hello world with posix threads * */ #include < pthread.h > #include < stdio.h > void *printme(void *arg) { printf("Hello world\n"); return NULL; } int main() { pthread_t tcb; void *status; int err; err = pthread_create(&tcb, NULL, printme, NULL); if (err != 0) { perror("pthread_create"); exit(1); } err = pthread_join(tcb, &status); if (err != 0) { perror("pthread_join"); exit(1); } return(0); }Try copying hw.c to your home area, compiling it, and running it. It should print out ``Hello world''.
Here's the output of print4.c when run on the department's Linux systems:
Hi. I'm thread 1147619072 Hi. I'm thread 1139226368 Hi. I'm thread 1156011776 main thread -- Hi. I'm thread 1156015936 I'm 1156015936 Trying to join with thread 1156011776 Hi. I'm thread 1130833664 1156015936 Joined with thread 1156011776 I'm 1156015936 Trying to join with thread 1147619072 1156015936 Joined with thread 1147619072 I'm 1156015936 Trying to join with thread 1139226368 1156015936 Joined with thread 1139226368 I'm 1156015936 Trying to join with thread 1130833664 1156015936 Joined with thread 1130833664So what happened is the following. The main() program forked the first 3 threads and they each ran in turn. Then the main() thread got control and printed its message and called pthread_join() and then the 4th thread ran and printed its message. After that, the main thread got control again and the call to pthread_join() completed. Then the main thread tried to join with the other threads and those joins succeeded. Finally, when main() returns, all the threads are done, and the program exits.
Three things to note. The main program is implicitly, itself, a thread. Notice that thread 1156015936 was never created but the call to Ego() works all the same. Secondly, the order in which created threads run is not defined by pthreads. Thirdly, pthreads is free to choose any way it wants to name threads.
Under OSX, the following output is generated from the same program:
./print4 Hi. I'm thread 236052480 Hi. I'm thread 236589056 Hi. I'm thread 237125632 Hi. I'm thread 237662208 main thread -- Hi. I'm thread 2077442432 I'm 2077442432 Trying to join with thread 236052480 2077442432 Joined with thread 236052480 I'm 2077442432 Trying to join with thread 236589056 2077442432 Joined with thread 236589056 I'm 2077442432 Trying to join with thread 237125632 2077442432 Joined with thread 237125632 I'm 2077442432 Trying to join with thread 237662208 2077442432 Joined with thread 237662208
Notice anything different? It is key to your cosmic wa and general happiness that you understand both of these of these executions are absolutely correct. That is, the thread system is free to impose either ordering and any naming scheme it chooses. It is your responsibility to ensure that threads execute in the order you want them to and we'll discuss how you can control this ordering.
Here, all threads, including the main() program exit with pthread_exit(). You'll see that the output is the same as print4. Notice, however, that the main thread cannot call printme() and get the same output since printme() calls pthread_exit(). p4b.c illustrates what happens when we replace the printf statement at line 69 with a call to printme() which contains a pthread_exit(). The output (for Linux) is:
./p4b Hi. I'm thread 357766912 main thread -- Hi. I'm thread 357771072 Hi. I'm thread 340981504 Hi. I'm thread 332588800 Hi. I'm thread 349374208You'll note that none of the "Joining" lines were printed out because the main thread had exited. However, the other threads ran just fine, and the program terminated when all the threads had exited.
The second thing you need to know is that when a forked thread returns from its initial calling procedure (e.g. printme() in print4.c, then that is the same as calling pthread_exit(). However, if the main() thread returns and it is the first to run, then that is the same as calling exit(), and the program dies. Here is where you really need to be careful. Check out p4c.c. Here is the Linux output
./p4c Hi. I'm thread 766605056 Hi. I'm thread 758212352 Hi. I'm thread 749819648 main thread -- Hi. I'm thread 766609216 Hi. I'm thread 741426944Notice that the main thread runs, prints its output, and then the fourth thread runs, and then the prgram exits. Why?
Here is another run
rich@csil:~/public_html/class/cs170/notes/IntroThreads$ ./p4c Hi. I'm thread 272631552 Hi. I'm thread 264238848 main thread -- Hi. I'm thread 272635712 Hi. I'm thread 255846144 Hi. I'm thread 247301888 Hi. I'm thread 247301888Um. Yeah. The last thread to run gets run twice? Technically, I'd call this a bug. However, it turns out that there is an ambiguity in the pthread standard with regards to when threads are scheduled. In particular, because they are pre-emptive, they can be scheduled or unscheduled at any moment, including when a thread makes a system call. In this example, when the main thread prints its output and then calls exit(0), the pthreads scheduler decides to deschedule the main thread before the exit call is processed by the kernel and to run the next runnable thread (thread 741426944). When that thread is finished, the main thread gets rescheduled and the system call completes casuing the program to exit. Under OSX Mt. Lion, you get
main thread -- Hi. I'm thread 45535232 Hi. I'm thread 44998656 Hi. I'm thread 44462080 Hi. I'm thread 46071808 Hi. I'm thread 2077442432Which which exhibits a different thread schedule (the main thread runs first) but which makes the same decision with respect the descheduling of the main thread when exit(0) is called.
However, under an older version of OSX, the output is
main thread -- Hi. I'm thread -1610609172and that's it. All threads have been created when the main thread exits, but they haven't run yet. This version of pthreads decided to complete the exit(0) call before scheduling the other runnable threads. When the main thread returns, the task is terminated, and thus the threads do not run.
Again, it is critical that you understand that all of these programs are correct from the perspective of the standard.
Finally, look at p4d.c. Here, the threads call exit() instead of pthread_exit(). Run it a bunch of times on the CSIL systems. You'll note that the output varies. Here is one run
rich@csil:~/class/cs170/notes/IntroThreads$ ./p4d Hi. I'm thread 2101860096 Hi. I'm thread 2093467392 Hi. I'm thread 2093467392 Hi. I'm thread rich@csil:~/class/cs170/notes/IntroThreads$and here is another run of exactly the same program
rich@csil:~/class/cs170/notes/IntroThreads$ ./p4d Hi. I'm thread 23750400 Hi. I'm thread 15357696 Hi. I'm thread 15357696 rich@csil:~/class/cs170/notes/IntroThreads$and another
./p4d rich@csil:~/class/cs170/notes/IntroThreads$ ./p4d Hi. I'm thread -703346944 rich@csil:~/class/cs170/notes/IntroThreads$Can you explain how each of these runs happened?
The thread entry point called AddIt() takes a single void * argument. It converts that pointer to a pointer to a structure of type struct thread_arg so that it can extract the two fields: value and increment. It then mallocs a structure for the return value and puts into it the sum of the value and the increment that are passed. Finally, it frees the argument structure and passes the pointer to the return structure to pthread_exit() casted as a void *. The calling thread gets this pointer through a call to pthread_join() and, once the return values is printed out, frees the malloced space.
You should study this code very carefully. Not only does it illustrate the common method of parameter and return value passing under pthreads, but it covers most of the important C concepts (e.g. malloc(), casting, structures, pointers and addresses) that you will need for the remainder of this class. If this code is not 100% crystal clear, you should consider brushing up on your C.