CS270 -- File Systems Class Project

Rich Wolski --- Fall, 2023

this page last updated: Wed Sep 6 11:24:43 PDT 2023


Getting Started

Before you can start the project, you will need Eucalyptus credentials so that you can access the campus private cloud. To get them, you need to email me with Once I receive such an email, I will respond to all team members with a shared set of credentials.

File System Project

The pedagogical goal for the overall class project is for you to have the experience of building a working file system for Linux. Doing so will hopefully acquaint you with the concepts of system calls, the Linux file abstraction, and device I/O. The resulting file system will (if it is complete) operate as a complete functional equivalent to a file system that ships with Linux (e.g. ext4, xfs, etc.) with possibly lower performance. However, to achieve this level of functionality can require more work than one might anticipate. Thus, the plan is to complete the overall assignment in two phases.

Phase 1

To get the process started, your first assignment is to create a "fake" file system that sends me mail when I run a small test program. The result of this "Phase 1" assignment will be useful to completing the final project.

The main software tool we will use to integrate your eventual file system implementation as a working file system for Linux will be the FUSE file system utility and, in particular, LibFUSE. FUSE is a way to interpose call back functions you write between the Linux file system calls and the actual file system. That is, you write call-back functions that FUSE invokes when a program (any program) makes a Linux system call that accesses the file system.

Ultimately, by the end of of Phase 2, the call-backs you write will implement as much of the Linux file semantics as you can manage by the end of the quarter. However, in Phase 1, your goal is to write a set of call backs that allow a program I will write to open a file (the name doesn't matter) and then to write a string into the resulting file descriptor that you package in an email and send to my email address. Writing this "fake" file system (that really only supports a few system calls) is intended to achieve the following goals:

This last feature is increasingly peculiar to operating systems. Most software today is designed to operate as an Internet-accessible service that can be manipulated, debugged, and "repaired" while it is in use. Operating systems "ship" as software releases because they must install and run on machines outside of the administrative domain of the developer. Thus, one must prepare an operating system to be installed and used by an unseen and unaccountable user, with no recourse once the software is released. To some extent, we will attempt to have this experience (and to understand its impact on system software development) in this course.

In terms of preparation, if the terms "system call" and "file system" are not part of your current technical understanding, then you should review the following materials:

We will review (but only review) the latter in this class in a lecture. If you are new to computer science (e.g. you do not have an undergraduate background in computer science) and these lecture materials seem utterly foreign, you should consider taking an undergraduate OS class before attempting this project.

Summarizing Phase 1

In Phase 1, you are to To accomplish this phase, you will need to familiarize yourself with the FUSE documentation (such that it is). You will also need to learn about how to configure Linux in a cloud (the Eucalyptus campus cloud, in our case). Finally, you will need to spend some time writing and -- most importantly -- testing your instructions.

Grading for Phase 1

To grade your assignment I will read your instructions and follow them (largely using cut-and-paste). If, at some point, they fail, I will stop and assign a grade. If, at the end of the recipe, I receive an email, your Phase 1 will receive full credit.

Note that I will make no assumptions regarding what you "mean" in your build and test instructions. I will expect the precise commands I will need to cut-and-paste from the document to

Often, when creating such a recipe, you need to include commands that take (as parameters) values that are not known until previous commands are executed. For example, when you run an instance in Eucalyptus, its IP address and DNS name are assigned (at random) by the cloud. Thus you cannot write down a command that takes the IP address in a document that I can directly cut-and-paste from.

You should indicate these parameters appropriately. For example, part of your recipe will instruct me to start a VM using a specific ami (image identifier). The ami corresponds to a specific Linux distribution, but the command to launch it requires that I use a key identifier that belongs to my user in the cloud. It is fine to indicate this as

aws ec2 run-instances --key-name [mykeyname.key] --image-id ami-b157b02b9b51c65ce --instance-type t2.medium
where the square brackets indicate that I should insert what ever key I plan to use to make my test.

Using Eucalyptus

To complete both Phase 1 and Phase 2, you will need to mount a file system and (for Phase 2) read and write a raw disk device. To mount a file system using FUSE you will need root privileges. To make it possible to provide you with a Linux environment where you can have root, we will use the Eucalyptus private cloud here at UCSB. This tutorial explains the basics. Eucalyptus is essentially a private-cloud equivalent to Amazon's AWS. For this class, you can start Linux VMs that you can use for development and to run your file system. You can also create volumes that are virtualized raw disk partitions to use as secondary storage.

Phase 2

From an engineering perspective (i.e. what it is you need to do), Phase 2 of the project decomposes into three tasks: Implicitly, there is a fourth task which is to integrate these three tasks into a single, working file system.

File System Abstractions

Here, you have some latitude. As long as you implement the semantics of the file system calls correctly, the design and implementation of the internals is up to you. Any internal organization is acceptable however it must be in memory only (i.e. you can't just stick things in a database or open a Linux file in your FUSE call-backs).

Secondary Storage

The third part of the exercise is to write the secondary storage management that persists the file system state across machine reboots. The goal is to be able to shut down your file system (either through an unmount or a machine reboot) and have all of the files remain in tact and in the same state when the file system is remounted.

For this part of the project, you will need to read and write a raw storage partition in 4K blocks. That is, all accesses to persistent storage must read or write a complete 4K block.

Phase 2 Project Deliverables

Dates and Grading Procedures

Phase 1 of the project is due

Grading Procedure for Phase 1

You should create a tar file with your code and instructions and email it to me. Please do not send me links to repositories or other on-line services (e.g. Google docs). Your instructions (which should be in a text file or a PDF) should explain exactly how to build and test your solution. I will execute your instructions and assign a grade using the contents of your tar file.

Grading Procedure for Phase 2

The final project is due at the end of the class. I will schedule a time slot to meet with each team during the class period on one of the following two days: You will present your file system and demonstrate your final project (this activity will take place in lieu of a final). The format for this evaluation is that you will provide me with access to your file system so that I can run a series of tests on it and to ask you questions about its response.

For the demonstration, you will use a Eucalyptus VM and a 30GB volume that you will have formatted and mounted. My tests will assume that the file system has at least 30GB, and at least 20GB of usable space. You will

During your assigned time slot, I will log in to your VM and run a series of test codes on your file system. Afterwards, you will make a brief presentation to the class about your file system development experience.

It is best if you work in a team of either 2, 3, or 4. If you wish to work alone or to form a team larger than 4, please contact me so we can discuss the feasibility of and likelihood of success. All members of the team will receive the same grade.

I will assign presentation times randomly to each group or individuals for time slots during those two days.

Referencing Existing Work

As mentioned, the goal of the project is to provide an opportunity to build a file system from scratch -- an opportunity that is hopefully as beneficial as it is rare professionally. Still, there exists a myriad of open source file systems that have been developed using FUSE and much or all of the functionality necessary to succeed with this project is likely to be available as freely accessible code. Further, it is often easiest to understand how to implement some of these abstractions through code examples.

For this project, it is acceptable to use existing code as reference material however it is not acceptable to incorporate code snippets or routines from other projects. That is, you may read code you find that helps illuminate a particular concept just as you may read a paper or other text, however you may not copy (either by hand or through electronic means) code for use in your project. Further, as part of your code base, you must include a README or LICENSE file that cites what ever references (text or code) that you have chosen to consult.