Project #3: Virtual Memory Management and File-System
Due 11:59:59pm on Nov 28, 2011
Part I (Virtual Memory)
-
Tasks
-
Additional Information
-
Click here for suggested implementation steps.
Part II (File System)
-
Introduction
-
Tasks (What to Extend)
-
Issues to Consider
-
Click here for suggested implementation steps.
General Info
-
Required Output
-
Testing Strategy
-
Report/code submission
Part I Tasks
For this part of the assignment, you will be working in the
vm
directory
of NACHOS. Your goal is to implement demand-paged virtual memory.
The main things that you have
to do are summarized as follows.
-
For this assignment, you will need to read and write data to the backing
store, which should be a file called
SWAP
. The swap file should
be accessed using the NACHOS OpenFile
class.
You can use the filesystem stub routines (the default), so that the
SWAP
file is created as a Unix file. You do not need to test again later using Part 2.
-
Implement a page fault handler and a page replacement policy using LRU or FIFO with second chance.
The page table should contain dirty bits to avoid unnecessary disk write.
You should test your code under various conditions of system load,
including one process with an address space larger than physical memory,
several concurrently running processes with combined address spaces
larger than physical memory, and random process switching (random yielding).
The sort
program in the
test
directory is an example of
a program designed to stress the virtual memory system.
Page tables were used in assignment 2 to simplify memory allocation
and to isolate failures from one address space from affecting other
programs. In this assignment we will use page tables to tell the
hardware which pages are resident in physical memory and which are
only resident on the disk. If the valid bit in a particular page table
entry is set, the hardware assumes that the corresponding virtual page
is loaded into physical memory in the physical page frame specified by
the physicalPage field. Whenever the program generates an access to
that virtual page, the hardware accesses the physical page. If the
valid bit is not set, the hardware generates a page fault exception
whenever the program accesses that virtual page. Your exception
handler will then find a free physical page frame, read the page in
from the backing store (typically a paging file) to that physical page
frame, update the page table to reflect the new virtual to physical
mapping, and restart the program. The hardware will then resume the
execution of the program at the instruction that generated the
fault. This time the access should go through, since the page has been
loaded into physical memory.
To find a free frame, your exception fault handler may need to eject a
cached page from its physical page frame. If the ejected page has been
modified in the physical memory, the page fault handling code must
write the page out to the backing store before reading the accessed
page into its physical page frame. The hardware maintains some
information that will help the page fault handler determine which
steps need to be taken on a page fault. In addition to the valid bit,
every page table entry contains a use and a dirty bit. The hardware
sets the use bit every time it accesses the corresponding page; if the
access is a write the hardware also sets the dirty bit. Your code may
use these bits in its page fault handling code. For example, your
code can use the dirty bit to determine if it needs to write an
ejected page back to the backing store. When a page is read in from
disk, the page fault handler should clear the dirty and use bits in
its page table entry. If the page is ever ejected, the page fault
handler checks its dirty bit. If the dirty bit is still clear, the
copy of the page on disk is identical to the copy in physical memory
and there is no need to write the page back to disk before ejecting
it.
On a page fault, the kernel must decide which page to replace;
ideally, it will throw out a page that will not be referenced for a
long time, keeping pages in memory those that are soon to be
referenced. Another consideration is that the operating system may be
able to avoid the overhead of writing modified pages to disk inside
the page fault handler by writing modified pages to disk in
advance. The page fault handler can take advantage of the clean
physical page frame to complete subsequent page faults more
quickly. FIFO replacement policy is the easiest one to implement, so
you may want to start from there to test your page swapping mechanism.
But in reality, FIFO replacement policy is rarely used due to its bad
performance. You will have to implement some other more efficient replacement
policy in your final version, such as FIFO with second chance or LRU.
Click here for suggested implementation steps.
The multiprogramming and virtual memory assignments made use of the
Nachos file system with the stub version.
The last phase of your project is to use the Nachos' own file system
and enhance its functionality. Your implementation should be under sub-directory "filesys".
The first step is to read and understand the partial file system we
have written for you under filesys sub-directory. Run the program `nachos -f -cp test/small small'
for a simple test case of our code - `-f' formats the emulated physical
disk, and `-cp' copies the UNIX file `test/small' onto that disk.
The files to focus on are:
-
-
fstest.cc -- a simple test case for our file system. filesys.h,
-
-
filesys.cc -- top-level interface to the file system. directory.h,
-
-
directory.cc -- translates file names to disk file headers; the directory data structure is stored as a file.
-
-
filehdr.h, filehdr.cc -- manages the data structure representing the
layout of a file's data on disk.
-
-
openfile.h, openfile.cc -- translates file reads and writes to disk sector reads and writes.
-
-
synchdisk.h, synchdisk.cc -- provides synchronous access to the asynchronous physical disk, so that threads
block until their requests have completed.
-
-
disk.h, disk.cc -- emulates a physical disk, by sending requests to
read and write disk blocks to a UNIX file and then generating an interrupt after some period of time. The details
of how to make read and write requests varies tremendously from disk
device to disk device; in practice, you would want to hide these
details behind something like the abstraction provided by this module.
Our file system has a UNIX-like interface, so you may also wish to read
the UNIX man pages for creat, open, close, read, write, lseek, and
unlink (e.g., type "man creat"). Our file system has calls that are
similar (but not identical) to these; the file system translates these
calls into physical disk operations. One major difference is that our
file system is implemented in C++. Create (like UNIX creat), Open
(open), and Remove (unlink) are defined on the FileSystem object, since
they involve manipulating file names and directories. FileSystem::Open
returns a pointer to an OpenFile object, which is used for direct file
operations such as Seek (lseek), Read (read), Write (write). An open
file is "closed" by deleting the OpenFile object.
Many of the data structures in our file system are stored both in
memory and on disk. To provide some uniformity, all these data
structures have a "FetchFrom" procedure that reads the data off disk
and into memory, and a "WriteBack" procedure that stores the data back
to disk. Note that the in memory and on disk representations do not
have to be identical.
You are asked to modify the file system to allow the maximum size of a file to be as large as
100Kbytes. In the basic
file system, each file is limited to a file size of just under 4Kbytes.
Each file has a header (class FileHeader) that is a table of direct
pointers to the disk blocks for that file (called i-node in Unix). Since the header is stored
in one disk sector with 128 bytes, the maximum size of a file is limited by the number
of pointers that will fit in one disk sector. Two things need to be done.
-
Implement single indirect pointer blocks so that the file size limit can be increased to 100K bytes.
You may choose a different number of i-node pointers in the file header.
One is to choose the minimum number of i-node pointers
to satisfy the file size constraint (namely up to 100K bytes) and use other entries as direct data pointers.
Another option is to consider every entry as an i-node pointer.
Discuss the trade-offs of these two design options in the writeup by listing advantages and
disadvantages and justify your choice.
- Make sure that the size of each file can actually grow.
In Nachos' basic file system, files are not extensible
and the file size is specified during the file creation time. You are forced to create each file
of size 100K bytes and then the entire Nachos disk can only support one file.
In UNIX and most other file systems, a file is initially
created with size 0 and is then expanded every time a write is made off
the end of the file.
There are two options to solve this problem.
The first option is to initially allocate sufficient space for the i-node structure as if the size were
100K bytes. But you donot actually allocate disk blocks for data
until they are needed when executing system call Write().
The second option is to initially set the file size as 0 and gradually expand
the file size. You choose one option in the code implementation.
In your submitted writeup, you should compare these two designs, discuss the trade-offs, and explain
what you have chosen with detailed implementation steps.
Here are some things you should be sure to get right:
-
-
Be sure your file header is not larger than one disk sector.
-
-
Be sure to support direct file access with "holes" in the file. In other
words, your index data structure
should only point to sectors that have actually had data written into
them by some file system operation. The exception is that if the file
is created to be a given size, your implementation should allocate as
many sectors as required to hold that amount of data.
-
-
Be sure to implement extensible files - if the program writes
beyond the end of the file, be sure to automatically extend the file to hold the written data.
-
-
Be sure to gracefully handle the case when the program attempts to
write more data to the disk when disk is full or the maximum size limit is reached.
-
-
Be sure to reclaim the disk blocks when a file is removed (e.g. use nachos -r ).
-
-
Be sure that each disk block is allocated to at most one file.
Notice that for the input program file, you can still use the regular
Unix file system to read its content.
Currently, the basic file system code assumes it is accessed by a single thread at a time.
You can still keep this assumption for this part of the assignment while in a real system,
synchronization is needed to allow multiple threads to use file system concurrently.
Given everybody is busy espcially towards the end of the quarter,
it is very useful to play trade-offs in completing this project. Understand the
design options, but choose something simple and easy to do.
You should first try to complete this project with minimal efforts
to pass these test programs. The hidden test program used in
grading are very similar to these public cases and
complex test cases will not be used for grading.
When requirements for certain features are vague or not specified,
choose something that is easy to do.
Click here for suggested implementation steps.
Part 1
For the following outputs, [pid] is the id of the process on which
behalf the operation is performed. [virtualPage] is the involved virtual page number (i.e.
the page index into the process virtual address space) and [physicalPage] is the involved
physical page number (i.e., the page index into the physical memory of the Nachos virtual
machine).
-
Whenever a page is loaded into physical memory, print
L [pid]: [virtualPage] -> [physicalPage]
-
Whenever a page is evicted from physical memory and written to the swap area, print
S [pid]: [physicalPage]
-
Whenever a page is evicted from physical memory and not written to the swap area,
print
E [pid]: [physicalPage]
-
(This line is removed as we donot require the copy-on-write implementation)
Whenever a process writes to a shared page (and this page needs to be
duplicated), print
D [pid]: [virtualPage]
-
Whenever a process obtains a zero-filled demand page for the first time (i.e., when you
allocate and zero the page out), print
Z [pid]: [virtualPage]
Part 2
For the following outputs, [pid] is the id of the process on which
behalf the operation is performed. [fileID] is the ID of the file upon which the
operation is performed. [oldSize] previous size of the object (in bytes or entries) and
[newSize] is the size it was extended to.
-
Whenever a process extends a file from some size to another, print
F [pid][fileID]: [oldSize] -> [newSize]
You need to design and develop test cases. You will be able to test part 1 without having part 2 implemented.
You will be able to test part 2 without having part 1 implemented.
You can wrap your part 1 and part 2 implementations in #defines to
allow each to be "removed" allowing the other to be tested
independently.
As a simplification,
you donot need to test cases that work with both virtual memory and extended file system.
The following tips are useful:
- Clean out all the .c files in the code/test directory. There might be
name conflicts from previous projects and you don't want to run old
userprogs.
- Get sample test programs in part1 and part2 of the ~cs170/nachos-projtest/proj3 directory.
- Test part 1. Test as you would for project 2. You can run the
./nachos in the vm/ or userprog/ directories.
You can use the unix file system for the test.
- Test part 2. It can be convenient to create another test directory (e.g. test under test2 directory or fielsys/test) .
In order to test userprogs with the filesys version
of nachos, you will need to load the userprog initially run when nachos
starts (-x arg on the command line) onto the Nachos filesystem before it
can be run. You can do this using the -cp option (see below). This
necessary because binary filesys/nachos does not recognize files in the unix filesystem.
Examples of usage:
csil$ pwd
cs170/nachos/code/filesys
csil$ ./nachos -f -cp ../test2/Prog1 Prog1 -x Prog1
If there are more than one file referenced in Prog1 (via Exec),
to pass each on the command line as such:
csil$ ./nachos -f -cp ../test2/Prog1 Prog1 -cp
../test2/Prog2 Prog2 -x Prog1
Now the Open and close calls will need to work in a _flat_ namespace
there is no "mkdir" in nachos so all files on the Nachos disk are in
the same un-named directory. This might mean changing the .c files
you are testing part 2 with by removing all references to ../test.
-
You have to provide a writeup in a file
HW3_WRITEUP
in which you have to mention
which portions of the assignment you have been able to complete. And for the ones
that are incomplete, what is their current status.
When you just submit something that does not work and give no explanations, expect to receive no credit.
This report will be graded.
- Describe the design of your project.
-
For virtual memory extension, describe your design and what has been changed.
- For file system extension,
- 1) Describe design options and their trade-offs (advantages and disadvantages) for
i-node design and for handling file size growth. Describe your reasons in choosing your current design.
- 2) List/explain main components and code modification in implementing your design.
- Provide a list of test program names and explain what you try to test with each test program, and your findings for each test.
- Also include both group members' names.
- Go to your 'nachos' directory.
- Turn in your 'code' directory with the command:
- turnin PROJECT3@cs170 code
You can turnin up to 3 times per project and not more than that!
The earlier versions will be discarded.
Note: only one turnin per group is accepted!