MSc in Network Centered Computing

Module: PVM and MPI

What is PVM ?

Parallel processing, the method of having many small tasks solve one large problem, has emerged as a key enabling technology in modern computing. There exists now two major developments: massively parallel processors (MPPs) and distributed computing. The Parallel Virtual Machine (PVM) system uses the message passing model to allow programmers to exploit distributed computing across a wide variety of computer types, including MPPs. PVM allows a heterogeneous collection of workstations and supercomputers to function as a single high-performance parallel machine. PVM is portable and runs on a wide variety of modern platforms. It has been well accepted by the global computing community and used successfully for solving large-scale problems in science, industry, and business.

futher information:

PVM manual
PVM book

read this to compile your code and to start PVM
using Linux: PVM shell scripts
using SUN Classic Cluster: PVM shell scripts

PVM assignment
additional information for the assignment

What is MPI ?

The primary goal of the MPI specification is to demonstrate that users need not compromise among efficiency, portability and functionality. The MPI concept is based on massage passing systems, the processes executing in parallel have separate address space. Communication occurs when a portion of one process's address space is copied into another process's address space. MPI is the first specification that allows parallel library writers to write truly portable libraries.

futher information:

MPI manual
MPI Reference
Tutorial on MPI: The Message-Passing Interface

read this to compile your code and to start MPI
using Linux: MPI shell scripts
using SUN Classic Cluster: MPI shell scripts

MPI assignment
additional information for the assignment

What is SPMD ?

The PVM and MPI computing model is based on the idea that an application consists of several tasks. Each task is responsible for a part of the application's computational workload. Sometimes an application is parallelized along its functions, each task performs a different function, for example, input, problem setup, solution, output, and display. This process is often called functional parallelism. A more common method of parallelizing an application is called data parallelism. In this method all the tasks are the same, but each one only knows and solves a small part of the data. This is also referred to as the SPMD (single-program multiple-data) model of computing.

What is Master / Slave Principe ?

The master/slave programming model is a very popular model used in distributed computing. In this model exists two separate programs, master and slave program. The master has the control over the running application, it controls all data and it calls the slaves to do there work.

Matrix-by-Matrix Multiplication and how it works

The program should calculate C = AB, where C, A, and B are all square matrices. For simplicity we assume that n*n tasks will be used to calculate the solution. Each task will calculate a subblock of the resulting matrix C. The block size and the value of n are given as a parameter to the program (these can be the first two values in input file). The matrices A and B can also stored as blocks distributed over the n² tasks. Before delving into the details of the program, let us first describe the algorithm at a high level. Assume we have a grid of n*n tasks. Each task gets a block of A (A1, A2, .., An) and B (B1, B2,.., Bn). All blocks have the same size, which is compatible to the value of processors. The B block will be now also split in blocks with the same size (B11, B12, .., B1n). In the first step the first section of B will be multiplied with the first A block. After the multiplication, A will be shifted right to next slave and the current slave get an other part of A from its left neighbour. This part will be multiplied with the next section of B (this section is also accessible with a shift right in B). The results are stored in C block. This shift process will be performed as long as blocks of A exists. After all multiplications get the master the results from all slaves and put it together. The program for the mathematic algorithm should be written in C which is a row-oriented language. That means that the memory allocation is line-wisely. Under these circumstances is it better to read the matrices transposed from the input file, but one may not forget to store the result transposed again.

Matrix-by-Vector Multiplication and how it works

The program should calculate C = AB, where A is square matrices, B and C are vectors. Each task should calculate a subblock of the resulting vector C. The block size and the value of n are given as a parameter to the program (these can be again the first two values in input file). The matrix A is also stored as blocks distributed over the n tasks. All blocks should have the same size, which is compatible to the value of processors. Each task get a part of A and the whole vector B. The algorithm is similar like matrix-by-matrix multiplication, but you don't need the shift operation and therefore the computation is shorter. After all calculations get the master task the results from each slave and put it together.