OpenMP parallel loop crashes (?)

OpenMP parallel loop crashes (?)

Hello everybody,

I am trying to make the section of my code to run parallel:

....
     
     

      
    EL=0.0d0

 !$OMP parallel DO SHARED(S,COUL) PRIVATE(I1,J1,ID,JD) reduction(+:EL) 
   
    DO J1=1,NY
    DO I1=1,NX
  
 IF ((J1/=J.OR.I1/=I).AND.(J1/=J.OR.I1/=IP(I)).AND.(J1/=J.OR.I1/=IM(I)).AND.(J1/=JP(J).OR.I1/=I).AND.(J1/=JM(J).OR.I1/=I)) THEN 
 

    IF (ABS(FLOAT(I)-FLOAT(I1)) <= ABS(FLOAT(I)+LLEN-FLOAT(I1))) THEN 
    ID= INT(ABS(FLOAT(I)-FLOAT(I1)))
    ELSE 
    ID= INT(ABS(FLOAT(I)+LLEN-FLOAT(I1)))
    END IF
 
    IF (ABS(FLOAT(J)-FLOAT(J1)) <= ABS(FLOAT(J)+LLEN-FLOAT(J1))) THEN 
    JD= INT(ABS(FLOAT(J)-FLOAT(J1)))
    ELSE 
    JD= INT(ABS(FLOAT(J)+LLEN-FLOAT(J1)))
    END IF

 
    EL= EL + LAMDA*COUL(ID,JD)*dble(S(I1,J1))
 !   Cen(I,J)= Cen(I,J) +  LAMDA*COUL(ID,JD)*dble(S(I1,J1))
 
 END IF
 
   
    END DO
    END DO
 
 !$OMP END PARALLEL DO

...

where COUL is a matrix determined earlier in the code.

I get no compilation or build errors but at run time the program exits when it enters the parallel loop. It just crashes with no run-time error!

Any ideas?

Thanks,

Marios

 

 

 

9 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.
Portrait de jimdempseyatthecove

Try turning on array subscripting bounds checks.

If nothing is obvious, insert some PRINT statements to trace the progress.

I assume LLEN and LAMDA are defined.

Jim Dempsey

www.quickthreadprogramming.com
Portrait de jimdempseyatthecove

Also,

If this is a release mode issue, then from VS click on

Debug | Start Without Debugging

This is different than Run

Run will close the CMD window. If errors were displayed, you won't see them.

Start Without Debugging leaves the CMD window open after run. Any error messages displayed in the CMD window can then be read.

Jim Dempsey

www.quickthreadprogramming.com

I run it in linux with ifort:

 ifort -O3 -warn all -xSSE4.2 -parallel -par-report[1] -openmp -o run.out Source1.f90

and got the run-time error message:

Segmentation fault (core dumped)

At least now I do get an error message! Any ideas about how to fix it?

I note that I used the command ulimit -s unlimited prior to compiling

Marios

You might want to review the SHARED list or as Jim has indicated, consider what to do if ID or JD = 0

   EL=0.0d0
!$OMP parallel DO SHARED(LAMDA,COUL,S,LLEN,I,J,IP,IM,JP,JM) PRIVATE(I1,J1,ID,JD) reduction(+:EL)

	 

	  DO J1=1,NY

	   DO I1=1,NX
!   not sure of this test is sufficient

	     IF ( (J1/=J    .OR.I1/=I)     .AND.  &           !  .not. ( J1==j     .and. I1==I     )

	          (J1/=J    .OR.I1/=IP(I)) .AND.  &           !  .not. ( J1==j     .and. I1==IP(I) )

	          (J1/=J    .OR.I1/=IM(I)) .AND.  &           !  .not. ( J1==j     .and. I1==IM(I) )

	          (J1/=JP(J).OR.I1/=I)     .AND.  &           !  .not. ( J1==JP(J) .and. I1==I     )

	          (J1/=JM(J).OR.I1/=I)           ) THEN       !  .not. ( J1==JM(J) .and. I1==I     )

	!   could be         

	     if ( j1==j .and. (i1==i .or. i1==IP(i) .or. i1==IM(i)) ) cycle

	     if ( i1==i .and. (j1==j .or. j1==JP(J) .or. j1==JM(j)) ) cycle
         IF (ABS(FLOAT(I)-FLOAT(I1)) <= ABS(FLOAT(I)+LLEN-FLOAT(I1))) THEN

	           ID = INT (ABS(FLOAT(I)-FLOAT(I1)))

	         ELSE

	           ID = INT (ABS(FLOAT(I)+LLEN-FLOAT(I1)))

	         END IF
         IF (ABS(FLOAT(J)-FLOAT(J1)) <= ABS(FLOAT(J)+LLEN-FLOAT(J1))) THEN

	           JD = INT (ABS(FLOAT(J)-FLOAT(J1)))

	         ELSE

	           JD = INT (ABS(FLOAT(J)+LLEN-FLOAT(J1)))

	         END IF

	!

	!      Could be written as

	         ID = MIN ( ABS(I-I1), ABS(I+LLEN-I1) )

	         JD = MIN ( ABS(J-J1), ABS(J+LLEN-J1) )

	         if (ID==0 .or. JD==0 ) ?????? for COUL(ID,JD)

	        

	         EL = EL + LAMDA*COUL(ID,JD)*dble(S(I1,J1))

	!        Cen(I,J)= Cen(I,J) +  LAMDA*COUL(ID,JD)*dble(S(I1,J1))
    END IF
   END DO

	  END DO
!$OMP END PARALLEL DO

Portrait de jimdempseyatthecove

Do not use ulimit/ulimited for multi-threaded programs. Pick a reasonable size.

Jim Dempsey

www.quickthreadprogramming.com

Dear John Campbell,

ID and JD can't be both zero, so that's ok. The OpenMP statement is correct, at least I don't get an error message.

The changes you proposed made my code run a bit faster, so thank you!

One note though:

the IF-CYCLE construct shoulbe be like this:

IF (J1==J .AND. (I1==I .OR. I1==IP(I) .OR. I1==IM(I))) CYCLE

IF (I1==I .AND. (J1==JP(J) .OR. J1==JM(J))) CYCLE

since I1==I,J1==J is excluded from by the first IF

 

 

 

The OpenMP statement is correct, at least I don't get an error message. I get a stack overflow message when I execute it in parallel. If I turn on the heap arrays compiler option the program runs normally but it's slower than the sequential. Any ideas about that?

The concern that I had related to the use of COUL(ID,JD) when ID or JD are zero, which depends on how it is declared. To not have a problem, it would need to be something like  real COUL(0:md,0:md). ( I am assuming COUL is an array and not a function )

With regard to the $OMP parallel DO declaration, my preference is to explicitly declare all variables as shared or private.

Finally, the effectiveness of !$OMP requires that the do loops perform a sufficient amount of computation to overcome the overhead of setting up the threads. The code structure is effectively,

!$OMP parallel DO SHARED(S,COUL) PRIVATE(I1,J1,ID,JD) reduction(+:EL)
 
   DO J1=1,NY

     call getavailable thread

     call perform the inner loops with allocated thread
 
   END DO  ! J1

!$OMP END PARALLEL DO

Where the inner loop cycle is performed by an allocated thread and all private variables must be allocated
This loop is:
   DO I1=1,NX
 
   IF ((J1/=J.OR.I1/=I).AND.
       (J1/=J.OR.I1/=IP(I)).AND.
       (J1/=J.OR.I1/=IM(I)).AND.
       (J1/=JP(J).OR.I1/=I).AND.
       (J1/=JM(J).OR.I1/=I)) THEN

   IF (ABS(FLOAT(I)-FLOAT(I1)) <= ABS(FLOAT(I)+LLEN-FLOAT(I1))) THEN
   ID= INT(ABS(FLOAT(I)-FLOAT(I1)))
   ELSE
   ID= INT(ABS(FLOAT(I)+LLEN-FLOAT(I1)))
   END IF

   IF (ABS(FLOAT(J)-FLOAT(J1)) <= ABS(FLOAT(J)+LLEN-FLOAT(J1))) THEN
   JD= INT(ABS(FLOAT(J)-FLOAT(J1)))
   ELSE
   JD= INT(ABS(FLOAT(J)+LLEN-FLOAT(J1)))
   END IF
   EL= EL + LAMDA*COUL(ID,JD)*dble(S(I1,J1))
!   Cen(I,J)= Cen(I,J) +  LAMDA*COUL(ID,JD)*dble(S(I1,J1))

   END IF
   END DO

This is essentially only :
   DO I1=1,NX
      EL= EL + LAMDA*COUL(ID,JD)*dble(S(I1,J1))
   END DO

This loop might be much better vectorised.
If the % of IF tests that exclude the computation are very small, then it might be better to replace the if test by a zero factor in COUL, remove LAMDA from the loop and take the performance gains from vectorisation, although the use of ID and JD could limit vectorisation.
( could LAMDA*COUL(ID,JD) be converted to a vector coul_jd(1:NX) outside the DO I1 loop then use a dot_product for this loop ? )

John

Portrait de jimdempseyatthecove

In front of:

EL= EL + LAMDA*COUL(ID,JD)*dble(S(I1,J1))

Insert some asserts to bounds check the arrays.
The compiler has an option to do this, the symptom you were seeing was as if the arrays were indexed out of bounds.

IF(ID .LE. LBOUND(COUL, DIM=1)) PRINT *, "ID .LE. LBOUND(COUL, DIM=1)", ID, LBOUND(COUL, DIM=1)
...

*** Do not assume anything about the bounds and validity of COUL and S ***

Also, if COUL and S are DUMMY arguments with explicit shape or explicit size, then assure that the actual arguments (those of the caller) match the requirements of the DUMMY argument.

If the above does not resolve anything then insert a PRINT in an appropriate place to trace the progress in hope of diagnosing the problem.

Jim Dempsey

www.quickthreadprogramming.com

Connectez-vous pour laisser un commentaire.