Consider the example of matrix-matrix multiplication as C = AB, with A(l
x n), B(n x m) and C(l x m). Partition A and B
into blocks as
The dimension have to be compatible so that
with
is well defined.
It is reasonable to assume that all the blocks Aik are all the
same size (k1 x k2) and that all the blocks Bkj
are all the same size (k2 x k3) and hence that all the
blocks Cij are all the same size (k1 x k3),
with k1, k2 and k3 not necessarily the same.
If we define the block columns
then
Then on a ring of r processors we can use the following algorithm:
LOAD PROCESSOR i:
Alocnum=I; Aloc = A(i); Bloc = B(i); Cloc = 0
FOR k = 1 TO r
DO IN PARALLEL i = 1 TO r
j = Alocnum;
MULTIPLY LOCAL BLOCKS,
Cloc = Cloc + Aloc * Bloc(j,i)
SHIFT CYCLIC Aloc,
Alocnmu right 1 place
END PARALLEL
END
Where Alocnum integer indicating the column number i, Bloc(k,i)
= Bki, A(i) = Ai, B(i) = Bi, Cloc means column
Ci allocated on processor i and Aloc means column AAlocnum
allocated currently on processor i. Final result is kept in block columns
Ci on processor i. Thus you have to collect these block
columns in order to assemble matrix C, for example in the master, in order
to save and print it.
[50 points]
© Oliver Otto - The University of Reading / Salford