240A Winter 2013 HW1

240A Winter 2016 HW1: Parallel matrix multiplication

You will port and parallelize code for matrix multiplication which is a basic building block in many scientific computations. The most naive code to multiply square matrices is:

  for i = 1 to n
    for j = 1 to n
      for k = 1 to n
        C[i,j] = C[i,j] + A[i,k] * B[k,j]
      end
    end
  end
Initialize A[i,j]= i+j. B[i,j]=i*j.

There are 3 options to implement the sequential code.

The sample C/C++ code for above 3 options with timing and test driver is available from this tar file . These 3 options implement the core function

  void square_dgemm( int n, double *A, double *B, double *C )
The matrices are stored in column-major order, i.e. entry Cij is stored at C[i+j*M]. The tar file includes:

dgemm-naive.cpp
A naive implementation of matrix multiply using three loops,
dgemm-blocked.cpp
A simple blocked implementation of matrix multiply,
dgemm-blas.cpp
A wrapper for using BLAS3 dgemm library function,
benchmark.cpp
The driver program that measures the runtime and verifies the correctness.
You can call BLAS dgemm() using Intel MKL library at Comet cluster. A small include file change for using dgemm() is here. A sample makefile for linking MKL is here.

What to do

  1. Port the code to Comet with the Intel MKL library. Report the megaflops numbers using the above 3 options with n= 100, 200, 400, 800, and 1600 on one core.

  2. Parallelize the naive sequential program using openMP. Report megaflops numbers, parallel time, and speedup for n=1600 with 4, 8, 16, 24 cores.

  3. Parallelize the naive program using MPI. Process 0 collects the final results from all processes. Report megaflops numbers, parallel time, and speedup for n=1600 with 4, 8, 16, 32 processes (processors).

  4. Write the optimized pthreads code for parallel matrix multiplication so that you can obtain the "best" megaflops performance for n=1600 running on a cluster node with 8 and 24 cores. Report megaflops numbers and parallel time accomplished.

What to submit

Reference links :