CS270 -- File Systems Class Project

Rich Wolski --- Fall, 2024

this page last updated: Wed Sep 25 11:03:26 PDT 2024


Getting Started

Before you can start the project, you will need Eucalyptus credentials so that you can access the campus private cloud. To get them, you need to email me with Once I receive such an email, I will respond to all team members with an email that has the IP address of a VM that has been created for your team. I will add the ssh public keys to the ubuntu shared user account so all team members will have access. You must use this VM to complete the course project.

File System Project

The pedagogical goal for the overall class project is for you to have the experience of building a working file system for Linux. Doing so will hopefully acquaint you with the concepts of system calls, the Linux file abstraction, and device I/O. The resulting file system will (if it is complete) operate as a complete functional equivalent to a file system that ships with Linux (e.g. ext4, xfs, etc.) with possibly lower performance. However, to achieve this level of functionality can require more work than one might anticipate. Thus, the plan is to complete the overall assignment in two phases.

Phase 1

To get the process started, your first assignment is to create a "fake" file system that sends me mail when I run a small test program. The result of this "Phase 1" assignment will be useful to completing the final project.

The main software tool we will use to integrate your eventual file system implementation as a working file system for Linux will be the FUSE file system utility and, in particular, LibFUSE. FUSE is a way to interpose call back functions you write between the Linux file system calls and the actual file system. That is, you write call-back functions that FUSE invokes when a program (any program) makes a Linux system call that accesses the file system.

Ultimately, by the end of of Phase 2, the call-backs you write will implement as much of the Linux file semantics as you can manage by the end of the quarter. However, in Phase 1, your goal is to write a set of call backs that allow a program I will write to open a file (the name doesn't matter) and then to write a string into the resulting file descriptor that you package in an email and send to my email address. Writing this "fake" file system (that really only supports a few system calls) is intended to achieve the following goals:

This last feature is increasingly peculiar to operating systems. Most software today is designed to operate as an Internet-accessible service that can be manipulated, debugged, and "repaired" while it is in use. Operating systems "ship" as software releases because they must install and run on machines outside of the administrative domain of the developer. Thus, one must prepare an operating system to be installed and used by an unseen and unaccountable user, with no recourse once the software is released. To some extent, we will attempt to have this experience (and to understand its impact on system software development) in this course.

In terms of preparation, if the terms "system call" and "file system" are not part of your current technical understanding, then you should review the following materials:

We will review (but only review) the latter in this class in a lecture. If you are new to computer science (e.g. you do not have an undergraduate background in computer science) and these lecture materials seem utterly foreign, you should consider taking an undergraduate OS class before attempting this project.

Summarizing Phase 1

In Phase 1, you are to

To accomplish this phase, you will need to familiarize yourself with the FUSE documentation (such that it is). You will also need to learn about how to configure a Linux VM. Finally, the result of phase 1 will be a working FUSE installation that you can use to complete Phase 2.

Grading for Phase 1

To grade your submission, I will log in, become root, create a file in /cs270, and write a string into it. If I get an email with that string, your Phase 1 is working.

Using Your Instance

To complete both Phase 1 and Phase 2, you will need to mount a file system and (for Phase 2) read and write a raw disk device. To mount a file system using FUSE you will need root privileges. To make it possible to provide you with a Linux environment where you can have root, the IT staff will create a VM for you to use for this class. hen you send me your team information, I will send all members of the team an ssh key pair that will allow you to have access.

Part of the experience is for you to discover, on your own, how to manage Linux development at the operating system level, in general, and for the use of FUSE, in particular. However, to get you started, here are a few basics.

The VM runs Ubuntu 20.04 and has a 50GB raw disk volume attached. We will be using Ubuntu 20.04 because later versions of Ubuntu have had stability issues with respect to FUSE and CentOS Stream is too unstable. The image from which your instance has been created is minimally configured with respect to development and management tools. For Ubuntu, it is possible to install open source software packages in a number of different ways. The easiest is via the apt utility. Before you begin developing, then, you should log into the instance and install some basic packages. Depending on what technology stack you plan to use, you might need additional packages. Here, though, are the basic commands to execute. Note that one you do this installation, these packages will persist in your instance. You don't need to install them each time you log in.

Begin by logging into the instance, using ssh, as the ubuntu user. Put the private key in your home directory's .ssh directory. For my test installation, I put the private key in

~rich/.ssh/cs270_rsa
The IT staff will have installed the public key in the .ssh directory for the user ubuntu.

My test instance has the IP address "169.231.231.85" so I would type

ssh -i ~rich/.ssh/cs270_rsa ubuntu@169.231.231.85 
which should then log me in.

The ubuntu user has the ability to become root (without a password) in the instance. Thus, to install software packages, you can use the sudo utility.

My test solution for the class is written in C and uses make. To use a new instance to build and develop, I need the relevant packages to be installed. Here is the command sequence I used.

sudo apt-get update
sudo apt-get upgrade
sudo apt-get install gcc make gdb fuse3 libfuse3-dev
The apt utility will ask you if you wish to continue after each of these commands. Enter "y" each time and it will proceed to install the necessary packages.

After this command sequence, I can build (using make) my C version and debug it using gdb.

Unlocking the file system/Rebooting the Instance

FUSE interacts with the Linux kernel through the file system interface in a way that is fairly invasive. As such, it is possible for the code you write for FUSE to call to cause it to "hang" indefinitely while it waits for what it considers to be rational behavior by your code. Sometimes, while it is hanging, the rest of Linux also hangs and your instances becomes frozen.

There are two courses of action for you to pursue if this occurs. You should pursue them in this order. First, FUSE includes a "file interface" utility which allows you to interact with it using the file system. Modern Linux is fond of this kind of utility (cf /proc and cgroups). If your FUSE file system hangs, it is possible to abort what ever call to the file system caused the hang by writing a character to a specifc file.

When you run your FUSE daemon, you should see a connection number for your file system in /sys/fs/fuse/connections. On my test instance, that number is 55 which is, itself, a directory. If you were to type

ls /sys/fs/fuse/connections/55
on my instance, you'd see
abort  congestion_threshold  max_background  waiting
Writing anything onto the file /sys/fs/fuse/connections/55/abort causes FUSE to reset the file systemn connection. Thus, when I lock up my FUSE file system, I type
echo "0" > /sys/fs/fuse/connections/55/abort
which writes the string "0" onto that fictitious file, causing FUSE to abort.

If this fails, you can contact me and I can reboot the instance manually. I will be fairly busy this quarter outside of class and office hours so I cannot promise I will be able to respond to your need for a reboot immediately.

Phase 2

From an engineering perspective (i.e. what it is you need to do), Phase 2 of the project decomposes into three tasks: Implicitly, there is a fourth task which is to integrate these three tasks into a single, working file system.

You will need to make this file system work with a raw disk. Your VM has such a disk attached to it as /dev/vdb. It is 50GB but your formatted file system need only be 30GB in size (the extra space is so that your bookkeeping records do not spill outside 30GB).

File System Abstractions

Here, you have some latitude. As long as you implement the semantics of the file system calls correctly, the design and implementation of the internals is up to you. Any internal organization is acceptable however it must be in memory only (i.e. you can't just stick things in a database or open a Linux file in your FUSE call-backs).

Secondary Storage

The third part of the exercise is to write the secondary storage management that persists the file system state in /dev/vdb across machine reboots. The goal is to be able to shut down your file system (either through an unmount or a machine reboot) and have all of the files remain in tact and in the same state when the file system is remounted.

For this part of the project, you will need to read and write a raw storage partition in 4K blocks. That is, all accesses to persistent storage must read or write a complete 4K block.

Phase 2 Project Deliverables

Dates and Grading Procedures

Phase 1 of the project is due

Grading Procedure for Phase 1

You should send me an email with the IP address of your instance and the directory where you have your emailing FUSE file system mounted. I will login to your instance as the ubuntu user, become root, and write a file with a test string into that directory. If I receive an email with that string, your project gets full credit.

Grading Procedure for Phase 2

The final project is due at the end of the class. I will schedule a time slot to meet with each team during the class period on one of the following two days: You will present your file system and demonstrate your final project (this activity will take place in lieu of a final). The format for this evaluation is that you will provide me with access to your file system so that I can run a series of tests on it and to ask you questions about its response.

For the demonstration, you will use your VM VM and 50GB volume that you will have formatted and mounted. My tests will assume that the file system has at least 30GB. You will

During your assigned time slot, I will log in to your VM and run a series of test codes on your file system. Afterwards, you will make a brief presentation to the class about your file system development experience.

It is best if you work in a team of either 2, 3, or 4. If you wish to work alone or to form a team larger than 4, please contact me so we can discuss the feasibility of and likelihood of success. All members of the team will receive the same grade.

I will assign presentation times randomly to each group or individuals for time slots during those two days.

Referencing Existing Work

As mentioned, the goal of the project is to provide an opportunity to build a file system from scratch -- an opportunity that is hopefully as beneficial as it is rare professionally. Still, there exists a myriad of open source file systems that have been developed using FUSE and much or all of the functionality necessary to succeed with this project is likely to be available as freely accessible code. Further, it is often easiest to understand how to implement some of these abstractions through code examples.

For this project, it is acceptable to use existing code as reference material however it is not acceptable to incorporate code snippets or routines from other projects. That is, you may read code you find that helps illuminate a particular concept just as you may read a paper or other text, however you may not copy (either by hand or through electronic means) code for use in your project. Further, as part of your code base, you must include a README or LICENSE file that cites what ever references (text or code) that you have chosen to consult.