Hi, every body

I should be very obliged if some one can help me on this: Suppose for simplicity that there is a parallel loop, and two matrices A & B, both having dimension m*n, and also we have P processors. Matrix A is distributed row-wise and B is distributed column-wise on P processors. Can you possibly give an elegant and fast way for doing the following pesudo-code using MPI commands (like MPI_Gather & MPI_Type_struct &...) in C language?

**for k=1,...,KMAX** // parallel loop on all P processors

// some parallel calculation on B to produce new results

// dim A= dim B= m*n

//A is distributed row-wise on P processors for example in local_A

//B is distributed colum-wise on P processorsfor example in local_B

**A=B** // ???? how to do thisfast in the parallel, using MPI

**end for**

For an example suppose that P=3 (**p0,p1,p2**) , m=n=4 and note that first row of A and first column of B is stored on p0,..... (see the following diagram). Please note that each process has stored diiferent amount of A & B and we want direct setting i.e. a(i,j)=b(i,j) (no transpose). The bottleneck is that we want to set A=B in a loop for many times and each process has stored diiferent amount of A & B.

Thanks very much in advance

Best regards,

Ham. Sha.

Mat. A

**set A=B in MPI**

Mat.B

**p0**

**p1**

**p2**

**p2**

**p0**

a11

a12

a13

a14

b11

b12

b13

b14

**p1**

a21

a22

a23

a24

**=**

b21

b22

b23

b24

**p2**

a3

1

a32

a33

a34

b31

b32

b33

b34

**p2**

a41

a42

a43

a44

b41

b42

b43

b44