I am trying to do the parallelization of a serial preconditioned conjugate gradient solver codes for 3D fire simulation using OpenMP (Intel compiler). But the performance seems not to be improved.
The grid dimension is 79x81x79 and the solver can converge after 565 iterations. The serial codes cost 3.39 seconds and the OpenMP version needs 3.86 seconds on Intel i7 2600.
Please help me to check the codes. Thanks a lot.
we are trying to speed up a parallel program using the Intel Compiler and the OpenMP library.
We have observed that the program hangs after running ok for 3-4 days, in one of the parallel loops. The binary keeps running but stays in a permanent waiting state. Here is the gdb stack trace:
I am running a hybrid MPI/OpenMP code which is:
call MPI_Init( ierr )
call MPI_Comm_rank( MPI_COMM_WORLD, rank, ierr )
call MPI_Comm_size( MPI_COMM_WORLD, size, ierr )
t1 = MPI_Wtime( )
!$omp parallel do private(i, x) reduction(+ : pi_partial)
do i = rank, N-1, size
x = (dble( i ) + 0.5_DP) * dx
pi_partial = pi_partial + f( x )
I have studied the openmp reduction function kmp_reduce and the conclusion is :
if (lck!=NULL) ==> we can do CRITICAL version
if (reduce_data!=NULL && reduce_func!=NULL) ==> we can do TREE version
if ((loc->flags & KMP_IDENT_ATOMIC_REDUCE) == KMP_IDENT_ATOMIC_REDUCE) ==> we can do ATOMIC version
So i have created 3 flags which tests the conditions above (CRITICAL, TREE, ATOMIC)
I have done some tests of openmp reductions to look at the results of these flags, but it seems to be always at 1.
Intel has just launched the Intel® Modern Code Developer Community to help HPC developers code for maximum performance on current and future hardware. The initiative is targeted to more than 400,000 HPC-focused developers and partners.