Here are some questions and answers related to CS140.
If you would like to be able to login to the expanse server without using a password, use the guide here
$ ssh <username>@login.expanse.sdsc.edu
The files you see from the login node are typically shared among all nodes in the cluster.
You may also access the cluster through a web server inferface:
https://portal.expanse.sdsc.edu/pun/sys/dashboard
The Expanse login node imposes a memory constraint on each user process so that you cannot run memory-intensive jobs. To run a parallel job or some of sequential jobs, you should submit a job to execute the binary by running through command sbatch. If you want an interactive debugging or do some memory-intensive compilation, then allocate a CPU or GPU node for interactive compilation or debugging (see below).
The login node may not have certain software modules pre-loaded for compiling and other special commands including job submission. You may have to use module load to load some required modules. See more explanations in Expanse guide on compiling and running.
module load sdsc
expanse-client user -p
Your account is billed based on the nearest CPU or GPU node hour. Expanse has its specific policy on charging and please read Expanse web document for details.
It is easy to accidentally chew through your time allocation by running a program with dead lock or leaving an interactive session open. One thing you can do is to specify a maximum time the job can execute if possible to avoid the excessive usage. Or cancel your job as shown below if you feel something is wrong.
Files for CS140 are relatively small and your home directory can hold them.
Submitting a job allows a dynamic allocation of some machine nodes when available. That is required if this job takes time to run.
Before using this makefile to compile an MPI program with gcc in the login node of Expanse, you may need to load certain software modules first:
module reset
module load gcc openmpi
make
This makefile also shows how to submit a job that executes an MPI binary. For example, it uses script run-mv_mult_test_mpi.sh to submit a job with command "sbatch" in order to execute the matrix-vector multiplication MPI code. Once a job is sumitted using "sbatch -v run-mv_mult_test_mpi.sh", the system shows a job number and places this job in a queue, waiting for an allocation of required computing nodes. After a set of nodes is alloced to execute this job, the system saves the output of this job in a file specified in the script. Command "srun" or "ibrun" in this script executes an MPI binary with a selected number of processes once computing nodes are allocated to this job.
In this example, the makefile for a multi-threaded C program shows how to submit a job that executes a binary. For example, it uses script run-expanse to submit a job with command "sbatch" in order to execute the binary code named "ex1". Once a job is submitted using "sbatch run-expanse", the system shows a job number and places this job in a queue, waiting for an allocation of required computing nodes using an account (change it as csb175 allocated for this class). After a set of nodes is allocated to execute this job, the system saves the output of this job in a file specified in the script.
Use the follow command to check the job status:
module load slurm
squeue -u UserName
Use the following command to cancel a job:
scancel JobID
The following examples are modified from Expanse website:
module load slurm
srun --pty --nodes=1 --ntasks-per-node=1 -p debug -t 00:10:00 -A csb175 /bin/bash
DONOT FORGET to exit this interactive job (otherwise your account is still being charged for CPU and/or GPU time). To do that, just type shell command "exit".
If Partition "debug" is not available, use "shared". Please only use "debug", "shared" for CPU-only jobs, and use "gpu-debug" and "gpu-shared" for GPU jobs.
Do NOT use other partitions which are expensive. For example, you will be charged using the entire cluster with all cores even you only one core/one machine in Partition "compute" or "ind-compute".