Consider the example of matrix-matrix multiplication as C = AB, with A(l
x n), B(n x m) and C(l x m). Partition A and B
into blocks as
The dimension have to be compatible so that
with
is well defined.
It is reasonable to assume that all the blocks Aik are all the
same size (k1 x k2) and that all the blocks Bkj
are all the same size (k2 x k3) and hence that all the
blocks Cij are all the same size (k1 x k3),
with k1, k2 and k3 not necessarily the same.
If we define the block columns
then
Then on a ring of r processors we can use the following algorithm:
LOAD PROCESSOR i:
Alocnum=I; Aloc = A(i); Bloc = B(i); Cloc = 0
FOR k = 1 TO r
DO IN PARALLEL i = 1 TO r
j = Alocnum;
MULTIPLY LOCAL BLOCKS,
Cloc = Cloc + Aloc * Bloc(j,i)
SHIFT CYCLIC Aloc,
Alocnmu right 1 place
END PARALLEL
END
[50 points]
© Oliver Otto - The University of Reading / Salford