Give a man a program, frustrate him for a day. <i>Teach</i> a man to program, frustrate him for a lifetime.

Assignment 2: A Basic Compute Farm

Purpose

Motivation

Each large Internet computing project, such as SETI@HOME, tackles some problem that has a simple parallel decomposition. We call such "embarrassingly parallel" problems piecework-parallel, indicating that a problem in this class has a piecework decomposition: The problem decomposes into objects that implement Task, and whose execute methods return values that can be composed into a solution to the original problem.

Piecework Decomposition

Fig. 1: Pieceworktask decomposition topology

Paper Summary

Submit a 1-page summary, entirely in your own words, of the paper titled, "How to Build a ComputeFarm."

Specification

In this assignment, you build a basic compute farm infrastrure for hosting piecework-parallel problems. The client decomposes the problem, constructing a set of Task objects. These tasks are passed to a Space, which makes them available to compute servers, called Computer objects, which function much like those in your first assignment. The results computed by Computers are returned to the Space. The client retrieves results from the Space, composing them into a solution to the original problem.

The API

Task

package api;

public interface Task<T extends Serializable> extends Serializable
{
    Result<T> execute();
}

The Result container

Result<T> class has:

A definition follows:
package api;

public class Result<T extends Serializable> implements Serializable
{
    private T taskReturnValue;
    private long taskRunTime;

    public Result( T taskReturnValue )
    {
        assert taskReturnValue != null; // disable in production
        this.taskReturnValue = taskReturnValue;
    }

    public T getTaskReturnValueType() { return taskReturnValue; }

    public void setTaskRunTime( long taskRunTime )
    {
        assert taskRunTime >= 0;
        this.taskRunTime = taskRunTime;
    }

    public long getTaskRunTime() { return taskRunTime; }
}

The Computer interface

package system;

public interface Computer extends Remote
{
    <T extends Serializable> Result<T> execute( Task<<T> task ) throws RemoteException;

    void exit() throws RemoteException;
}

The Space interface

package api;

public interface Space extends Remote
{
    public static String SERVICE_NAME = "Space";

    void put( Task task ) throws RemoteException;

    Result take() throws RemoteException;

    void exit() throws RemoteException;
}

The client decomposes the problem into a set of Task objects, and passes them to the Space via the put method. In principle, these task objects can be processed in parallel by Computers. Since the client puts tasks into the Space and the Computers take tasks from the Space (or the Space takes tasks to give to Computers), the process fits the Producer-Consumer design pattern. Please consult section 5.3 of the textbook for advice. After passing all the task objects to the Space, the client retrieves the associated Result objects via the take method: This method blocks until a Result is available to return the the client. Since Computers put Result objects into the Space and the client "consumes" we again see the Producer-Consumer design pattern. If the client sent 10 Task objects to the Space, it could execute:

Result[] results = new Result[10]; 
for ( int i = 0; i < results.length; i++ ) 
{ 
    results[i] = space.take(); // waits for a result to become available. 
}

If a particular Result needs to be associated with a particular Task (e.g., a Mandelbrot Result), this information is passed as a component of the Task execute method's return value. Based on this association, if it matters, it composes the result values into a solution to the original problem.

The exit method is a deployment convenience: When invoked, the space invokes the exit method on each registered Computer, then exits itself. These exits are implemented via System.exit( 0 ), perhaps after logging some suitable comment that it is going down, with a time stamp. The implementation of each exit method thus cause the invoker to receive a RemoteException.

The SpaceImpl class

package system;

public interface Computer2Space extends java.rmi.Remote 
{   
    void register( Computer computer ) throws java.rmi.RemoteException;
}

Faulty Computers

For the purposes of this assignment, a computer is defined to be faulty when a Remote method invoked on it returns a RemoteException. The Space accommodates faulty computers: If a computer that is running a task returns a RemoteException, the task is assigned to another computer.

The Space implementation's main method instantiates a Space object and binds it into its rmiregistry.

The space's implementation of register should instantiate a ComputerProxy, which is a separate thread. This thread's run method loops forever, removing tasks from a queue, invoking the associated Computer's execute method with the task as its argument, and putting the returned Result object in a data structure for retrieval by the client. These data structures need to be thread-safe. Why? The Java BlockingQueue may be useful, as well as its LinkedBlockingQueue implementation.

The Computer Implementation

Fig. 2: The client-Space-Computer architecture.

Thread Safety

When an object implements a Remote method, the JVM allows that method to be invoked concurrently by multiple threads. Synchronizing access less than necessary leads to race conditions. One way to avoid race conditions is to declare all of the object's methods synchronous. However, this is not always possible. For example, if the object implements the Runnable interface, its run method may not be declared synchronous. In this case, when the run method accesses the object's state, put that code fragment in a synchronous block. Synchronizing more than necessary may lead to deadlock and livelock: Synchronization must be used carefully.

From section 4.1 of the textbook concerning the design process for a thread-safe class:

Please refer to the textbook for further guidance.

Task classes

For each of the Task classes that you defined in the 1st assignment, define a corresponding Task class that solves part of the original problem. The decompositions need not be masterpieces of efficiency. For the TSP, partition the set of all possible tours. For example, if there are n cities, you can partition the set of tours into n - 1 parts: those that begin with cities

The clients

Define a client for each application that:

The Job interface

Similar to the Compute Farm, you may wish to define and implement a Job interface. The main advantage of doing so is to increase code reuse among different clients and execution scenarios (see Experiments below). The Don't Repeat Yourself (DRY) design maxim is a energy investment strategy that generally yields dividends.

Experiments

Repeat the above steps for c = 1, 2, and 4. Graph the elapsed time for each problem over c, for each problem. For the case of c = 1, get completion time for 4 deployment scenarios:

Analysis

Deliverables

Directories

Files



 cappello@cs.ucsb.edu © Copyright 2010 Peter Cappello                                           2012.04.27