Job execution at TSCC can be conducted in two modes: batch and interactive. This cluster uses the TORQUE scheduler or called portal batch system (PBS) for running jobs with four different queues available. For this class, always use "hotel" queue.
The files you see from the login node are typically shared among all nodes in the cluster. But use this login machine only for general tasks. Allocate one node to compile your Java program. For running parallel jobs, use the queuing system discussed below to access the computer cluster. To copy a directory from csil.cs.ucsb,edu, allocate one TSCC node to avoid overloading the login node, and use scp in that node to copy data from csil.cs.ucsb.edu with the following command.
scp -r UserName@csil.cs.ucsb.edu/pathname .Storage For a small amount of data, copy your application source files to your $HOME directory. For a large amount of data, copy your data to a Data Oasis area with diretory name:
/oasis/tscc/scratch/
where
is your TSCC login name
Note that Data Oasis storage is not backed up and files stored on this system
may be lost or destroyed without recourse to restore them.
Long-term file storage should be maintained in your $HOME directory.
Click here
for more details on storage and compiling code.
Every class user is allocated with a certain amount of service units. Use the gbalance -u username command to check the amount of time in your account.
A user account has been billed according to the formula and most likely they still use the same formila:
#CPUs x #nodes x wall-clock-time.SU charge is rounded to the nearest CPU node hour. For example, if you run with 8 CPUs (cores) on 1 node for 12 minutes, your account will be charged 2 SUs.
It is easy to accidentally chew through your time allocation by running a program with dead lock or leaving an interactive session open. One thing you can do is to always specify a maximum time the job can execute. For example, 10-20 minutes.
The gcc compiler and other software packages are availble. For more information:
qsub -I -l nodes=2:ppn=1 -l walltime=00:15:00to start an interactive job on two nodes (each with 1 core) for at most 15 minutes.
Once you run it, it will wait until a node is available, and then it will create an SSH session to that node. Your job will last until you exit that SSH session.
Caveats: interactive jobs will be scheduled just like script-based jobs, so they'll take as long to execute as the queue is. Right now, this doesn't seem to be a problem, so running with the batch queue is fine. If the queue is slow, remember to stop it (control-C works) so that you don't waste a bunch of time doing nothing.
qsub job-script-filewill submit your job to the scheduler and assign it a job number. Be careful when you are submitting multiple jobs. The output files will get overwritten by subsequent or concurrent jobs writing to the same file.
How can you specify the number of machine nodes to be allocated? Click here for more details on batch job submision.
qdel jobnumber
where jobnumber was assigned by qsub.