I'm investigating a problem that I could boild down to basically the array initialization - which does not really seem to give me any speedup on Windows OS.
Consider the following piece of code:
integer, dimension(:), allocatable :: a integer :: dim=8000000 integer i allocate(a(1:dim)) !$OMP PARALLEL DO do i=1,dim a(i)=0 enddo !$OMP END PARALLEL DO
Then I do virtually get no speedup out of the parallel region.
Is this intended to be, or am I doing something stupid wrong ?
I was thinking that there are different NUMA-issues on Windows making the First-touch policy from Linux OS being "less valid" on Windows. However, I do observe the problem also when limiting myself to a single socket.