I've written a simple parallel matrix-vector multiplication using OpenMP.
!Parallel matrix-vector multiplication, row packed matrix !$OMP PARALLEL DEFAULT(SHARED) NUM_THREADS(4) !$OMP DO SCHEDULE(DYNAMIC) PRIVATE(i,j,itn,Sum) do i=1,n Sum = 0. itn = (i-1)*n do j=1,n Sum = Sum + A(itn+j)*x(j) end do y(i) = Sum end do !$OMP END DO NOWAIT !$OMP END PARALLEL
With an Intel Q6600 I get a serial to parallel time ratio of nearly 4, with a matrix size of 1000x1000.
When I increase the matrix size to 2000x2000 this ratio goes down to ~1.4.
What might be causing this dramatic decrease of parallel performance compared to the serial performance?