I've written a simple parallel matrix-vector multiplication using OpenMP.
!Parallel matrix-vector multiplication, row packed matrix
!$OMP PARALLEL DEFAULT(SHARED) NUM_THREADS(4)
!$OMP DO SCHEDULE(DYNAMIC) PRIVATE(i,j,itn,Sum)
do i=1,n
Sum = 0.
itn = (i-1)*n
do j=1,n
Sum = Sum + A(itn+j)*x(j)
end do
y(i) = Sum
end do
!$OMP END DO NOWAIT
!$OMP END PARALLELWith an Intel Q6600 I get a serial to parallel time ratio of nearly 4, with a matrix size of 1000x1000.
When I increase the matrix size to 2000x2000 this ratio goes down to ~1.4.
What might be causing this dramatic decrease of parallel performance compared to the serial performance?
Gregor Seitlinger



