CS170: Project 4 - Thread Local Storage (20% of project score)

Project Goals

The goals of this project are:

to provide protected memory regions for threads
to understand basic concepts of memory management

Administrative Information

The project is an individual project. It is due on Tuesday, May 28, 2019, 23:59:59 PST (no deadline extensions or late turn ins).

Implementing thread local storage

The goal of this project is to implement a library that provides protected memory regions for threads, which they can safely use as local storage. As you will recall from class, all threads share the same memory address space. This can be good, because it allows threads to easily share information. However, it can also be a problem when, typically due to programming bugs, one thread accidentally modifies values that another thread has stored in a variable. To protect data from being overwritten by other threads, it would be convenient for each thread to possess a protected storage area that only this thread can read from and write to. We call this protected storage area local storage. Your task is to implement support for local storage, either on top of your existing user mode thread library or on top of the existing Linux pthread implementation. In both cases, your code will be a library that offers the functions introduced below.

To provide support for thread local storage (TLS), you are supposed to implement the following functions in a library:

      int tls_create(unsigned int size)

This function creates a local storage area (LSA) for the currently executing thread that can hold (at least) size bytes. The function returns 0 on success. It is an error to create a local storage for a thread that already has one, and the size has to be larger than 0. In the case of an error, the function returns -1. After a LSA has been created, it can be read and written by the following two functions.

      int tls_write(unsigned int offset, unsigned int length, char *buffer)

This function reads length bytes, starting from the memory location pointed to by buffer, and writes them into the local storage area of the currently executing thread, starting at position offset. The function returns 0 on success. It is an error when the function would need to write more data than the LSA can hold (i.e., offset + length > size of LSA). In this case, no data is written, and the function returns -1. It is also an error when the current thread has no LSA, and the function returns -1. Finally, the function trusts that the buffer from which data is read holds at least length bytes. If not, the result of the call is undefined.

      int tls_read(unsigned int offset, unsigned int length, char *buffer)

This function reads length bytes from the local storage area of the currently executing thread, starting at position offset, and writes them into the memory location pointed to by buffer. The function returns 0 on success. It is an error when the function would need to read past the end of the LSA (i.e., offset + length > size of LSA). In this case, no data is read, and the function returns -1. It is also an error when the current thread has no LSA, and the function returns -1. Finally, the function trusts that the buffer into which data is written is large enough to hold at least length bytes. If not, the result of the call is undefined.

      int tls_destroy()

This function frees a previously allocated local storage area of the currently executing thread. The function returns 0 on success and -1 when the thread does not have a local storage area.

      int tls_clone(pthread_t tid)

This function clones the local storage area of a target thread identified by tid. When a thread local storage is cloned, the content is not simply copied. Instead, the storage areas of both threads initially refer to the same memory location. Only when one thread writes to its own LSA (using the tls_write function), then the TLS library creates a private copy of the region that is written. Note, though, that the remaining, untouched areas still remain shared. This approach is called CoW (copy-on-write), and it is done to save memory space and to avoid unnecessary copy operations. The function returns 0 on success. It is an error when the target thread has no LSA, or when the currently executing thread already has a LSA. In both cases, the function returns -1.

Whenever a thread attempts to read from or write to any thread local storage area, including its own, without using the appropriate tls_read and tls_write functions, then this thread should be terminated (by calling pthread_exit on its behalf). The remaining threads continue to run unaffected.

Since we have to implement TLS in user space and cannot modify the operating system, we introduce the following two simplifications to make our lives easier:

First, whenever a thread calls tls_read or tls_write, you can temporarily unprotect this thread's local storage area. That is, when a thread A is executing one of these two functions, it would be possible for another thread B (that happens to interrupt thread A) to access A's local storage (and only that of thread A) without being terminated.

Second, we will see that most memory protection operations do not work with byte granularity but with page granularity. Thus, we relax the sharing requirement for tls_clone. Assume that thread B clones the local storage of thread A (which has a size of 2*page-size -- where page-size is typically 4096 bytes and can be determined by calling the library routine getpagesize()). Now, let's assume that thread A writes one byte at the beginning of its own local storage. Originally, we required that only this bytes is copied. For convenience, we relax this requirement and allow the entire first page (i.e., the entire first 4096 of the local storage) to be copied. The second page, however, still remains shared between thread A and thread B.

Note that it is possible that more than two threads share the same local storage. That is, multiple threads can tls_clone the LSA of the same target thread, and all these threads would then point to the same memory region (pages). When one thread write to its storage, only this thread gets its own copy. The remaining threads would still share the same region.

      void* tls_get_internal_start_address()

This function returns a pointer to the starting address of the local storage area for the current thread. If the current thread did not allocate any local storage area, the function should return NULL.

Implementation

First, you need a way to create a local storage that cannot be directly accessed by a thread. To this end, I suggest that you use the library function mmap. mmap has two advantages: First, it allows one to obtain memory that is aligned with the start of a page, and the function allocates memory in multiples of the page size. Second, mmap allows you to create pages that have no read/write permissions, and thus, cannot be accessed arbitrarily.

Now that we have pages that are properly aligned and that cannot be accessed by any thread, the next question is how we can realize the tls_read and tls_write functions. For this, the library routine mprotect is very handy, which allows us to "unprotect" memory regions at the beginning of a read or write operation, and later "reprotect" it when we are done. Note that mprotect can only assign coarse-grain permissions at the level of individual pages. This is another reason why it is convenient to create the local storage area as multiples of memory pages.

It is important to observe that the local storage area of a thread can contain both shared pages and pages that are private copies. Hence, it is clear that these pages are not always contiguous in memory. As a result, when you perform read and write operations that span multiple pages of the local storage, you need to break up these operations into accesses to the individual pages.

Finally, the question arises what happens when a thread directly accesses a memory page (a LSA) that is protected. In this case, the operating system sends a signal (SIGSEGV) to the offending thread. Thus, you could install a signal handler for SIGSEGV, and whenever such a signal is caught, you simply terminate the currently running thread (by calling pthread_exit). Unfortunately, this is not correct, because there could be other reasons for a segmentation fault (a normal programming error). In this case, you do not want to only terminate the currently running thread, but kill the entire process and dump the core. Thus, your signal handler must be able to distinguish between a case in which a segmentation fault is caused by a thread that incorrectly accesses a local storage area, or a regular fault where no LSA is involved. To this end, I suggest that you look closely at the manual page for sigaction and try to find how the struct siginfo_t might help you to achieve your goal.

Deliverables

Please follow the instructions below exactly!

We use gradescope to manage your project submissions and to communicate the results back to you. You will submit all files that are part of your project via the gradescope web interface.
Your files must be in a directory named vmm. The name of the library that we will test and link must be called tls.o, and the implementation must be done in C/C++.
All files that you need to build your library must be included (sources, headers, makefile) in that folder. We will just call make and expect that the object file tls.o is built from your sources. Please do not include any object or executable files.
Gradescope does support built-in autograding, but, currently, we do not intend to use it. Instead, we will test your projects in our own environment. So, do not worry if you don't get immediate feedback or if the system tells you that the autograder is not running.
Your project must compile on a CSIL machine. If you worked on a Windows machine or your laptop at home, then make sure it still works on CSIL or modify it appropriately!
Include a README with this project. Explain what you did in the README. If you had problems, tell us why and what.

Created by Christopher Kruegel (© 2008, using Apache Cocoon).