this page last updated:
Wed Sep 25 11:03:26 PDT 2024
Getting Started
Before you can start the project, you will need Eucalyptus credentials so that
you can access the campus private cloud. To get them, you need to email me
with
- The Name of your Team
- The names of each team member (as they appear in egrades)
- The email addresses of each team member (for an email they check
regularly)
- An ssh public key for each project member to use to be able to access a
shared VM.
Once I receive such an email, I will respond to all team members with an email
that has the IP address of a VM that has been created for your team. I will
add the ssh public keys to the ubuntu shared user account so all team
members will have access. You must use this VM to complete the course
project.
File System Project
The pedagogical goal for the overall class project is for you to have the experience
of building a working file system for Linux. Doing so will hopefully acquaint
you with the concepts of system calls, the Linux file abstraction, and device
I/O. The resulting file system will (if it is complete) operate as a complete
functional equivalent to a file system that ships with Linux (e.g. ext4, xfs,
etc.) with possibly lower performance. However, to achieve this level of
functionality can require more work than one might anticipate.
Thus, the plan is to complete the overall assignment in two phases.
Phase 1
To get the
process started, your first assignment is to create a "fake" file system that
sends me mail when I run a small test program. The result of this "Phase 1"
assignment will be useful to completing the final project.
The main software tool we will use to integrate your eventual file system
implementation as a working file system for Linux will be the FUSE file
system utility and, in particular, LibFUSE.
FUSE is a way to interpose call back functions you write between the Linux
file system calls and the actual file system. That is, you write call-back
functions that FUSE invokes when a program (any program) makes a Linux system
call that accesses the file system.
Ultimately, by the end of of Phase 2, the call-backs you write will implement
as much of the Linux file semantics as you can manage by the end of the
quarter. However, in Phase 1, your goal is to write a set of call backs that
allow a program I will write to open a file (the name doesn't matter) and then
to write a string into the resulting file descriptor that you package in an
email and send to my email address. Writing this "fake" file system (that
really only supports a few system calls) is intended to achieve the following
goals:
- an understanding of how to instantiate and configure a Linux VM in a cloud
that you are free to modify using root privileges,
- an understanding of how to install, build, and develop using the FUSE
utility,
- an understanding of how to install the Linux dependencies in addition to
LibFUSE (e.g. sendmail) that are required to complete the assignment, and
- a basic understanding of the concept of "software release" in which your
code and documentation will be tested (without you present) to determine its
functionality.
This last feature is increasingly peculiar to operating systems. Most
software today is designed to operate as an Internet-accessible service that
can be manipulated, debugged, and "repaired" while it is in use. Operating
systems "ship" as software releases because they must install and run on
machines outside of the administrative domain of the developer. Thus, one
must prepare an operating system to be installed and used by an unseen and
unaccountable user, with no recourse once the software is released. To some
extent, we will attempt to have this experience (and to understand its impact
on system software development) in this course.
In terms of preparation, if the terms "system call" and "file system" are not
part of your current technical understanding, then you should review the
following materials:
We will review (but only review) the latter in this class in a lecture. If
you are new to computer science (e.g. you do not have an undergraduate
background in computer science) and these lecture materials seem utterly
foreign, you should consider taking an undergraduate OS class before
attempting this project.
Summarizing Phase 1
In Phase 1, you are to
- create a FUSE program that allows a calling program to open a
fictitious file for writing and to write a string into that file,
- intercept the file system calls using FUSE and run a shell "call out" to email my email address with that string
- create the directory /cs270 (as root) and run this file system in your instance under the directory /cs270
owned by root
To accomplish this phase, you will need to familiarize yourself with the FUSE
documentation (such that it is). You will also need to learn about how to
configure a Linux VM. Finally, the result of phase 1 will be a working FUSE
installation that you can use to complete Phase 2.
Grading for Phase 1
To grade your submission,
I will log in, become root, create a file in /cs270, and write a string into
it. If I get an email with that string, your Phase 1 is working.
Using Your Instance
To complete both Phase 1 and Phase 2, you will need to mount a file system and
(for Phase 2) read and write a raw disk device.
To mount a file system using FUSE you will need root privileges. To make it
possible to provide you with a Linux environment where you can have root, the
IT staff
will create a VM for you to use for this class.
hen you send me your team information, I will send all members of the team an
ssh key pair that will allow you to have access.
Part of the experience is for you to discover, on your own, how to manage
Linux development at the operating system level, in general, and for the use
of FUSE, in particular. However, to get you started, here are a few basics.
The VM runs Ubuntu 20.04 and has a 50GB raw disk volume attached. We will be
using Ubuntu 20.04 because later versions of Ubuntu have had stability issues
with respect to FUSE and CentOS Stream is too unstable. The image from which
your instance has been created is minimally configured with respect to
development and management tools. For Ubuntu, it is possible to install
open source software packages in a number of different ways.
The easiest is via the apt utility. Before you begin developing, then,
you should log into the instance and install some basic packages. Depending
on what technology stack you plan to use, you might need additional packages.
Here, though, are the basic commands to execute. Note that one you do this
installation, these packages will persist in your instance. You don't need to
install them each time you log in.
Begin by logging into the instance, using ssh, as the ubuntu user.
Put the private key in your home directory's .ssh directory.
For my test installation, I put the private key in
~rich/.ssh/cs270_rsa
The IT staff will have installed the public key in the .ssh directory for the
user ubuntu.
My test instance has the IP address "169.231.231.85" so I would type
ssh -i ~rich/.ssh/cs270_rsa ubuntu@169.231.231.85
which should then log me in.
The ubuntu user has the ability to become root (without a password) in
the instance. Thus, to install software packages, you can use the sudo
utility.
My test solution for the class is written in C and uses make. To use a
new instance to build and develop, I need the relevant packages to be
installed. Here is the command sequence I used.
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install gcc make gdb fuse3 libfuse3-dev
The apt utility will ask you if you wish to continue after each of
these commands. Enter "y" each time and it will proceed to install the
necessary packages.
After this command sequence, I can build (using make) my C version and
debug it using gdb.
Unlocking the file system/Rebooting the Instance
FUSE interacts with the Linux kernel through the file system interface in a
way that is fairly invasive. As such, it is possible for the code you write
for FUSE to call to cause it to "hang" indefinitely while it waits for what it
considers to be rational behavior by your code. Sometimes, while it is
hanging, the rest of Linux also hangs and your instances becomes frozen.
There are two courses of action for you to pursue if this occurs. You should
pursue them in this order. First, FUSE includes a "file interface" utility
which allows you to interact with it using the file system. Modern Linux
is fond of this kind of utility (cf /proc and cgroups). If your FUSE
file system hangs, it is possible to abort what ever call to the file system
caused the hang by writing a character to a specifc file.
When you run your FUSE daemon, you should see a connection number for your
file system in /sys/fs/fuse/connections. On my test instance, that
number is 55 which is, itself, a directory. If you were to type
ls /sys/fs/fuse/connections/55
on my instance, you'd see
abort congestion_threshold max_background waiting
Writing anything onto the file /sys/fs/fuse/connections/55/abort
causes FUSE to reset the file systemn connection. Thus, when I lock up my
FUSE file system, I type
echo "0" > /sys/fs/fuse/connections/55/abort
which writes the string "0" onto that fictitious file, causing FUSE to abort.
If this fails, you can contact me and I can reboot the instance manually. I
will be fairly busy this quarter outside of class and office hours so I cannot
promise I will be able to respond to your need for a reboot immediately.
Phase 2
From an engineering perspective (i.e. what it is you need to do), Phase 2 of
the project
decomposes into three tasks:
- system call implementation -- implementing the system calls that
can be issued by Linux on files (hopefully by replacing and then adding to the
FUSE call-backs you implemented for Phase 1),
- implementing the file abstraction -- building the internal data
structures and procedures necessary to implement files, and
- implementing secondary storage management -- building the parts
of the file system that persist in secondary storage
Implicitly, there is a fourth task which is to integrate these three tasks
into a single, working file system.
You will need to make this file system work with a raw disk. Your VM has such
a disk attached to it as /dev/vdb. It is 50GB but your formatted file
system need only be 30GB in size (the extra space is so that your bookkeeping
records do not spill outside 30GB).
File System Abstractions
Here, you have some latitude. As long as you implement the semantics of the
file system calls correctly, the design and implementation of the internals is
up to you. Any internal organization is acceptable however it must be in
memory only (i.e. you can't just stick things in a database or open a Linux
file in your FUSE call-backs).
Secondary Storage
The third part of the exercise is to write the secondary storage management
that persists the file system state in /dev/vdb
across machine reboots. The goal is to be
able to shut down your file system (either through an unmount or a machine
reboot) and have all of the files remain in tact and in the same state when
the file system is remounted.
For this part of the project, you will need to read and write a raw storage
partition in 4K blocks. That is, all accesses to persistent storage must read
or write a complete 4K block.
Phase 2 Project Deliverables
- Must Haves -- Implement the basic open/close/read/write/seek and
directory functions using FUSE. To
complete this part of the project you must
- implement mkfs to make a file system using a disk block
device,
- implement the basic file abstractions (block management, block maps,
directories,
etc.),
- integrate the file system implementation with FUSE
so that you should have a basic file system that works for most
programs that use the minimal POSIX file system interface
(open/close/read/write/seek).
- Good to Haves -- Complete as many of the FUSE file operations as you can
and optimize the performance. You need to implement a basic working
file system but it is likely to be quite slow and/or incomplete if you are careful about
persistence. Ideally, you should be able to run any regular Linux
commands (e.g. tar, gcc, grep, vi, etc.) in your file system just as on the
Linux file system itself. To do so, you'll need to make sure that you are
handling issue like access times, permissions, etc.
That is, a successful project is a
more complete version of the necessary Linux functionality, compared to the
basic functionality, that may also
improve performance.
Dates and Grading Procedures
Phase 1 of the project is due
Grading Procedure for Phase 1
You should send me an email with the IP address of your instance and the
directory where you
have your emailing FUSE file system mounted. I will login to your instance as the
ubuntu user, become root, and write a file with a test string into that
directory. If I receive an email with that string, your project gets full
credit.
Grading Procedure for Phase 2
The final project is due
at the end of the class. I will schedule a time slot to meet with each team
during the class period on one of the following two days:
- Dec. 2, 2024
- Dec. 4, 2024
You will present
your file system
and demonstrate your final project (this activity will
take place in lieu of a final). The format for this evaluation is that you
will provide me with access to your file system so that I can run a series of
tests on it and to ask you questions about its response.
For the demonstration, you will use your VM
VM and 50GB volume that you will have formatted and mounted. My tests will
assume that the file system has at least 30GB.
You will
- create a file system on the raw volume in /dev/vdb
using your version of mkfs,
- mount the file system
During your assigned time slot, I will log in to your VM and run a series of
test codes on your file system. Afterwards, you will make a brief presentation
to the class about your file system development experience.
It is best if you work in a team of either 2, 3, or 4. If you wish to work
alone or to form a team larger than 4, please contact me so we can discuss the
feasibility of and likelihood of success.
All members of the team will receive the same grade.
I will assign presentation times randomly to each group or individuals for time
slots during those two
days.
Referencing Existing Work
As mentioned, the goal of the project is to provide an opportunity to build a
file system from scratch -- an opportunity that is hopefully as beneficial as
it is rare professionally. Still, there exists a myriad of open source file
systems that have been developed using FUSE and much or all of the
functionality necessary to succeed with this project is likely to be
available as freely accessible code. Further, it is often easiest to
understand how to implement some of these abstractions through code examples.
For this project, it is acceptable to use existing code as reference material
however it is not acceptable to incorporate code snippets or routines from
other projects. That is, you may read code you find that helps illuminate a
particular concept just as you may read a paper or other text, however you may
not copy (either by hand or through electronic means) code for use in your
project. Further, as part of your code base, you must include a README or
LICENSE file that cites what ever references (text or code) that you have
chosen to consult.