Hello,I'm running on two machines two parallel algorithms for matrix multiplication to assess scalability, with OpenMPI.The first machine is a cluster of 4 Quad Core, total 16 CPUs available, the second is a Dell PC with 16 GB RAM and Intel Core i7 processor (total 8 CPUs available).Algorithm 1 performs multiplication as follows:

{ unsigned int i, j, k; double sum; for (i = 0; i < A.m; i++) // Rows { for (j = 0; j < B.n; j++) // Cols { sum = 0; for (k = 0; k < A.n; k++) sum += A.rows[i][k] * B.rows[k][j]; C.rows[i][j] = sum; } } }

In algorithm 2 I used pointers instead to enhance speedup:

{ unsigned int i, j, k; double *c_ptr = &C.rows[0][0]; double *b_ptr = &B.rows[0][0]; double *a_ptr = &A.rows[0][0]; for (i = 0; i < A.m; i++) // Rows { for (j = 0; j < B.dim; j++) // Cols { double sigma = 0; double *A_ptr = (a_ptr + i*A.dim); double *B_ptr = b_ptr + j*B.dim; for (k = 0; k < A.dim; k++) { sigma += (*A_ptr) * (*B_ptr); A_ptr++; B_ptr++; } *c_ptr++ = sigma; } } }

The MPI structure and data decomposition is the same for both programs.Algorithm 1 shows linear scalability on cluster up to 8 processors and linear sccalability up to 4 processors on PC. Algorithm 2 shows linear scalability on cluster up to 8 processors but is not scalable at all on PC.Tests were performed multiplying dense square matrices 1000x1000 and 5000x5000.Does anyone know what could the difference be? Is it in the algorithm or in the machine?Are there issues with dynamic memory allocation in MPi environment?Thanks for your help,

Carlo Maria