I am trying to use MPI to solve a problem as if
where A=A(M,N), B=B(N,O)
Before calculating C, I needs to create A and B using MPI paralelized code. However, because some reson, A can only be paralelized using M as distribution index, ie. A is distributed as AM(1:N) in different CPUs. On the other hand, B can onlydistributed as BO(1:N). Since both A and B are so larger, both gather and broadcast are not good for memory. So I am thinking just keep A and B as they were. When I calculate C, I use BN distribution as CPU index, when I need the information of A (ie. AM), I go to the responsible CPU to get the AM. as this:
// do l=1,N // parallelized,
mpi_send(AM, i, ..., o,..)
mpi_recv(AM,o, ..., i,...)
Of coz, this will not work because the send and receive are distributed in single thread.
So, I am here to ask for help. Is there any better idea? Thanks.