CS293B -- Cloud Computing Projects, Spring 2020

You are free to define a project of your own or to choose one of several I have outlined below. You may work as a team of up to three.

The project requirements are

you must hand in (by email) a written project plan no later than 11:59 PM on Monday, April 13, 2020. You project plan must
- list the names and email addresses of all team members
- describe the problem you are planning to solve
- outline your approach to solving it
- describe how you will test whether you have solved it or the degree to which you have addressed the problem
The plan must be sent to me as a PDF and should be in prose form.
projects can come in two forms
- application projects that your project explore some combination of cloud computing, edge or fog computing, and IoT "end-to-end." For this style of project, you will need to either find live data streams to use as "sensor" inputs or "fake" sensor data using replay from existing data sets. The data (which could be telemetry, images, audio, etc. in any combination) must then be manipulated and analyzed at the edge and/or in the cloud.
- technology projects that attempt to improve upon existing technologies for cloud, edge, and IoT. For a technology project, some comparison with the "best known" solution or solutions must be part of the evaluation.
Each team will meet with me (and any collaborators) during the last full week of classes which is June 1 through June 5, 2020. There will be no lectures during that week. Instead, I will schedule a 15 to 30 minute period to meet with each team so that I can enjoy and focus on your demo. My intention is to use the lecture periods on June 1 and June 3 for the demo time slots but (depending on course enrollment) we may also meet on other days that week.
Project materials that must be turned in include
- a project write up that describes the project, explains how it tracked the project plan, and discusses its evaluation.
- a short slide presentation (maximum of 10 slides) that presents the project
- all code and data sets that you used
- instructions for building and executing your demo so that your results are reproducible

I will use the materials and the discussion we have during your demo sessions to determine the project grade for this course.

Example Application Projects

Here are several example application projects you might consider. Any reasonable adaptation would be equally if not more appropriate. Where's The Bear 2.0: A couple of years ago we developed an edge computing solution that uses Google's Tensor Flow and Inception v3 model to automatically classify camera trap data from the Sedgwick Reserve. The project was called "Where's the Bear?" and the paper is

Elias, Andy Rosales, et al. "Where's the Bear?-Automating Wildlife Image Processing Using IoT and Edge Cloud Systems." Internet-of-Things Design and Implementation (IoTDI), 2017 IEEE/ACM Second International Conference on. IEEE, 2017.

This work can easily be extended in several ways. First, the original system only identified images with a single species. Identifying images with multiple species (which are rare but useful) would be a great improvement. Secondly, the ecologists who operate the camera traps would like to use them to count (e.g. for population estimates). They would REALLY like counts separated by age group (e.g. youngsters versus adults). The original system does no counting. Thirdly (and this is probably hard) the ecologists would like to know if it is possible to identify individual animals. IN addition, we would like to understand whether there is a relationship (temporal or spatial) between image capture and environmental conditions (meteorological, seasonal, drought, etc.) In particular, to what degree is it possible to predict when an image will be taken of a given species? The authors and collaborators as well as various data sets are available in the area as resources for this project. You can find out more here but you will need a UCSB NetID to access the images.

Nanoclimate forecasting: One big area of interest for IoT and cloud is agriculture (as evidenced by Microsoft's Farmbeats project). Estimating meteorological conditions at a fine-grained level is turning out to be an important capability that IoT for agriculture can provide. For example, agricultural engineers and scientists believe that it is possible to use highly localized temperature and humidity measurements to optimize crop management (e.g. frost prevention, differential irrigation scheduling, etc.) However it is often infeasible to instrument growing areas with densely distributed sensors. Doing so often carries a large infrastructure cost (both in terms of installation and maintenance) as well as the potential for interfering with farm operations. Thus Nanoclimate sensing and forecasting requires the heavy use of analytics to make inferences and predictions.

For example, at one orchard in the Central Valley of California, the growers would like to use a data from few carefully placed temperature sensors with the plethora of mesoscale and microclimate meteorological data to make fine-grained inferences and predictions of temperature and humidity at meter scale.

Another example project would be to try and determine the specific sets of data and data-fusion analytics that can infer the temperature in an arbitrary square meter of the orchard. For example, knowing the temperature at one location, the prevailing wind, and the solar radiocity, it is possible to infer the temperature at another location near by. How accurately can this inference be made? Where should sensors be placed? What is the minimum sensor to error ratio that is possible?

Forecasting (predicting a future temperature value) is another important area that is related to the inference problem. For frost prevention, for example, an inference is sufficient to allow the system to send an alert when frost is imminent, but it would be better to predict that it will occur several hours into the future.

All of the above are also true for inferring and forecasting humidity at the Nanoclimate level. Our group has access to an instrumented orchard and historical sensor data to support this project. In addition, this paper describes some early attempts at Nanoclimate temperature inferences using internal CPU temperatures as explanatory variables.

Both of these examples are intended to stimulate your imagination about what is possible (although they are both available as projects for this class as well). What is key, though, is that the solution is an "end-to-end" solution -- one that addresses a real-world problem using a combination of infrastructure for cloud/edge/IoT and analytics. To succeed, you must often develop a novel technology or amalgamate a set of existing technologies in an entirely new way.

Example Technology Projects

An alternative style of project for this class is one that hypothesizes the need for a new technology that is generally more useful (under some measure) than existing approaches. There are currently a raft of incumbent technologies for cloud/edge/IoT (e.g. MS IoT Hub, AWS Greengrass, etc.) but their development is nascent and frequently driven by commercial expediencies rather than computer scientific analysis. Another approach you could take in this class is to explore alternatives to these incumbents that are better suited to your understanding of what is necessary.

For example, we have developed a multi-scale, distributed Functions as a Service (FaaS) infrastructure called CSPOT

CSPOT: A Serverless Platform of Things

CSPOT make several new innovations. First, it defines a common, universal "append-only" storage abstraction for FaaS programs. This abstraction is simple enough to be implementable at the microcontroller level, yet powerful enough to function as the main storage abstraction for IoT applications at the edge and in the cloud. Secondly, it uses an append-only log as its runtime system so that it leaves behind a record of causal dependency between computations. Thus, by definition it is possible to recover causal execution chains in highly scalable deployments. Thirdly, CSPOT functions are very low latency -- two orders of magnitude faster than comparable AWS or Microsoft technologies. CSPOT is available as open source. There are a number of new technological advances that it could enable including

CSPOT-FS: a log structured file system for the CSPOT storage abstraction
SPOT-FU: distributed transactions for CSPOT
SPOXOS: Paxos for CSPOT
OSPOT: a unikernel native implmentation of CSPOT
SPOT-Leash: implementing chain replication as described in this paper by van Renesse and Schneider for CSPOT

These enhancements all emphasize the use of append-only data structures, wide-area causal dependency tracking, and FaaS programming as "universal" concepts in a cloud/edge/iot setting. While you would not necessarily need to implement an application "end-to-end" to demonstrate your work, you would need to be able to make a meaningful comparison to existing "state of the art" approaches.

Create your own Project

The examples described previous are intended to stimulate your interest. You are free to choose one of them, to use one of them as a jumping off point for a different idea, or to come up with something entirely new using your limitless creativity. By all means, if you have an idea for a project, contact me so we can discuss it. As long as it is exploring this new architectural space for distributed applications and it is either validated by a real-world application or a state-of-the-art competitor, it is almost assuredly in scope.

Resources

There are two campus clouds available to you in this class: one that runs Eucalyptus (which is API compatible with Amazon AWS) and another that runs OpenStack. In addition, I can probably arrange access to HTCondor -- a high-throughput cloud computing services with many powerful features. HTCondor is useful if you wish to build something that requires a great deal of scale, but can also tolerate a great deal of "churn" in resource availability. If you think your project might fit this model, please get in touch and we can discuss it. Finally, if your project will use GPUs, UCSB is just now setting up a Pacific Research Platform node. If you would like access to any of these infrastructures, please email me. However if you request access, plan to use the infrastructure for this class (i.e. don't ask for access just to have access).

Additionally, you are free to use any other cloud platform (e.g. in the free tier) to which you can gain access. Unfortunately, we do not have class credits from the public cloud vendors for this class.