Information on TSCC Cluster Usage

TSCC Cluster

Each node of TSCC Cluster has dual-socket, 8-core, 2.6GHz Intel Xeon E5-2670 (Sandy Bridge). Each node 64 GB memory (4GB/core) connected with 10GbE. Additional information can be found:

Job execution at TSCC can be conducted in two modes: batch and interactive. This cluster uses the TORQUE scheduler or called portal batch system (PBS) for running jobs with four different queues available. For this class, always use "hotel" queue.

Login and Copy Data from CSIL
: To use Triton, log in to the front-end machine: tscc-login.sdsc.edu

The files you see from the login node are typically shared among all nodes in the cluster. But use this login machine only for general tasks. Allocate one node to compile your Java program. For running parallel jobs, use the queuing system discussed below to access the computer cluster. To copy a directory from csil.cs.ucsb,edu, allocate one TSCC node to avoid overloading the login node, and use scp in that node to copy data from csil.cs.ucsb.edu with the following command.

scp -r   UserName@csil.cs.ucsb.edu/pathname   .

Storage For a small amount of data, copy your application source files to your $HOME directory. For a large amount of data, copy your data to a Data Oasis area with diretory name:

 /oasis/tscc/scratch/

where is your TSCC login name Note that Data Oasis storage is not backed up and files stored on this system may be lost or destroyed without recourse to restore them. Long-term file storage should be maintained in your $HOME directory. Click here for more details on storage and compiling code.

Account balance

Every class user is allocated with a certain amount of service units. Use the gbalance -u username command to check the amount of time in your account.

A user account has been billed according to the formula and most likely they still use the same formila:

#CPUs x #nodes x wall-clock-time.

SU charge is rounded to the nearest CPU node hour. For example, if you run with 8 CPUs (cores) on 1 node for 12 minutes, your account will be charged 2 SUs.

It is easy to accidentally chew through your time allocation by running a program with dead lock or leaving an interactive session open. One thing you can do is to always specify a maximum time the job can execute. For example, 10-20 minutes.

Compiling code
You should compile or run your code at the login node. But for a small compiling job for some languages, the compilation at the login node seems to be executable (This may change subject to the TSCC policy).

The gcc compiler and other software packages are availble. For more information:

Allocate one or a few machine nodes for an interactive execution
Command qsub -I allows you to submit interactive jobs, where you are given direct access to a set of computing nodes. This command generates a PBS_NODEFILE file containing all nodes assigned to the interactive job. Click here for more details on interactive node allocation.

Use the command
```
qsub -I -l nodes=2:ppn=1 -l walltime=00:15:00 
```
to start an interactive job on two nodes (each with 1 core) for at most 15 minutes.
Once you run it, it will wait until a node is available, and then it will create an SSH session to that node. Your job will last until you exit that SSH session.
Then you can issue a script to run a parallel job interactively.
Once your job is done, type "exit".
Caveats: interactive jobs will be scheduled just like script-based jobs, so they'll take as long to execute as the queue is. Right now, this doesn't seem to be a problem, so running with the batch queue is fine. If the queue is slow, remember to stop it (control-C works) so that you don't waste a bunch of time doing nothing.

Execute a parallel job in a batch mode
Job submission is done with a job script file with a number of options. These include the queue, the number of nodes, the title of your job, the maximum time it can take, where to put console output, and email notifications at job start and end, and Linux commands that run your programs.

To run a parallel job, so your script can include the commands for execution. Then you use qsub to submit a job.
Command
```
qsub job-script-file 
```
will submit your job to the scheduler and assign it a job number. Be careful when you are submitting multiple jobs. The output files will get overwritten by subsequent or concurrent jobs writing to the same file.
How can you specify the number of machine nodes to be allocated? Click here for more details on batch job submision.
qstat -a and showq -u $USER are useful for examining the current state of your jobs. In qstat, Q means queued, R means running, and C means complete. showq will give you slightly more detailed information about your jobs.
Sometimes, one queue will be more full than the other. If jobs are taking too long to run in the batch queue, try the small queue instead, or vice-versa. To do this, you should first delete the jobs you created previously. Since they haven't run yet, you can delete them with
qdel jobnumber
where jobnumber was assigned by qsub.