stack overflow

stack overflow

Dear users

I 've write a fortran code (based on77). In my code there is many larg loop which in normal compiler works correctly,but gets large time running.

So, I've decided to use openMP (for first time). After correct setup, the build process have no problem, but the running process shows the 

"forrtl: severe (170): Program Exception - stack overflow" error. After debugging the error is found at the following C- routin:

{
/* assign 0 to _debugger_hook_dummy so that the function is not folded in retail */
(_Reserved);
_debugger_hook_dummy = 0;
}

Would you help me please

regards

15 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.

Attached files are the source code and the reading data file

Anlagen: 

AnhangGröße
Herunterladen w16.txt3.53 MB
Herunterladen e3hh.for4.52 KB

Recompile with the /link /stack:10000000 option and run.

ok.

but how?

Pplease give me more info.

my OS is win7-64. 

Please read the "Getting Started" section of the Intel Fortran Compiler documentation.

You also need to read up on some fundamentals about OpenMP.

Your code currently has no parallel region. You do have a "!$OMP DO", but this resides in code that is not within a parallel region. Use:

!$OMP PARALLEL DO ...

!$OMP END PARALLEL DO
or
!$OMP PARALLEL ...
!$OMP DO ...
...
!$OMP END DO
!$OMP END PARALLEL

Additionally, you are missing a declaration as to which variables are to be private or public. Your intended parallel region has a large number of variables that require to be private, and a relatively few(er) number that need to be shared.

Consider !$OMP PARALLEL DO DEFAULT(PRIVATE), SHARED(RR,HF1,HF2,FF...

Your CAZ(IZ1)=... will have sharing issues (you may need to rework this section of code). I did not look close enough to see if CAZ should be private. Same issue with arrays CAPTETA2, CAPR, CA2, CA1, possibly others.

Jim Dempsey

www.quickthreadprogramming.com

I would try moving large arrays off the stack, by including the following definition.

COMMON /ZZ/ CAZ,CA2,CA1,CAPFI, CAPTETA1,CAPTETA2,CAPR

I am not sure how this might conflict with your selection of SHARED or PRIVATE for these vectors when implementing !$OMP. Jim might offer some advice on this. You would probably be able to overcome the problems you are having by reducing their size and leaving them out of COMMON.

I have reviewed the use of these variables in the loops you have:

CAZ(1:51) is used in the inner loop 888
CAPFI(1:51) is set in loop 88 and used in loop 8 (as a variable only ?)
CAPTETA2(1:51) is set in loop 88 and used in loop 8
CAPTETA1(1:51) is set and used in loop 8 (as a variable only ?)
CAPR(1:51) is set in loop 8 and used in loop 7
CA2(1:170) is set in loop 7 and used in loop 6
CA1(1:170) is set in loop 6 and used outside !$OMP

You could revise your code to include:

  integer*4, parameter :: M_I  = 170 
  integer*4, parameter :: M_IZ =  51
  Real*8 CAZ(M_IZ), CAPFI(M_IZ), CAPTETA1(M_IZ), CAPTETA2(M_IZ), CAPR(M_IZ)
  Real*8 CA1(M_I),  CA2(M_I)

You could also remove the use of F,R,W & FD, although I presume these will be used as the code is further developed.

It appears that this is early days in your attempts to implement !$OMP. I am not sure of your reason for the large arrays, but if you can first get your !$OMP working successfully with this problem as is, you can later address the issue of increasing the size of the problem.

You should also check the inner loop 888, as only RPHI and GAMMA are influenced by IZ3. Also CAZ(IZ1) = ... (which is independent of IZ3 ?) is not defined for all values (1:51) on exit from loop 888, before calling FININT. /Qopenmp would have problems with these aspects of this loop. I think you have some more work on this part of the code.

John

 

Thank u all

 I did  your recommendations, But the problem was not fixed. So, I've written a    "WRITE"-command in the begining of my

code And I've found that the program running process has not  been come to the first line of commands after declaration 

step. I've attached revised version and the print-screen of error.

Please give me more info.

Anlagen: 

You would probably gain more efficiency by reviewing what loop 888 is doing, as CAZ is the same value for all steps of IZ3.
I have changed the code to demonstrate some possible improvements. (see attached, as free format source)
I think you might not understand how DO loops work, especially the sequential way the 5 loops operate.
At the moment CAZ(1:51) is not fully defined before calling FININT after 888 loop to calculate the integral, as it is defined for only one value CAZ(IZ1). Have you checked you are getting the correct results ?

I would advise to get the calculation working correctly, before trying to use OPENMP.
There appear to be errors in the calculation approach, or you should shift those calculations out of loop 888 that don't need to be updated (changed), as I have demonstrated for some variables.

Read about how grouped DO loops work.
I hope you learn some more about Fortran from this example.

John

Anlagen: 

Ok daer frinds

I have did many of you recomendation and improve my knowledge as well as my code.

But after this, I have run the code with the following directive command. It's run in OMP mode very good

!$OMP PARALLEL SHARED(RKF1,rkf2,RR,FF,FFD,WW)
DO 6 I1=20,1706 CONTINUE

....( 4 loops else here) 

!$OMP END PARALLEL

It seems all things good and the code became parallel very good ( As it found at screen).

But when I decided to add the following to parallel DO loops, some error was found.

woulde you please say me about.

thanks

!$OMP PARALLEL
!$OMP DO SHARED(RKF1,rkf2,RR,FF,FFD,WW)

;;;;

;;;;

!$OMP END DO

!$OMP END PARALLEL

 And the errors:

Error 1 error #5082: Syntax error, found 'SHARED' when expecting one of: PRIVATE FIRSTPRIVATE REDUCTION LASTPRIVATE ORDERED SCHEDULE COLLAPSE <END-OF-STATEMENT> ..
Error 3 error #7622: Misplaced part of an OpenMP* parallel directive. 
Error 2 error #7628: This OpenMP* END directive does not match the OpenMP* block directive at the top of the stack. 
Error 4 Compilation Aborted (code 1)

Anlagen: 

AnhangGröße
Herunterladen e3h-copy-2.for5.12 KB
Herunterladen e3h15.txt0 Bytes

You claim that " It seems all things good ",  but the following statement can not work, as CAZ(1:51) ( CAZ1 ??) is not defined for this FININT call:

       CALL FININT(CAZ1,51 ,CAPfi(IZ2),D1)

You can not be getting the correct answers from this calculation.

I notice that the code is basically a 5th order integral. You should consider applying Green's theorem to reduce the order of the integration, as this might provide an order of magnitude improvement, much better than OpenMP could provide.

On your !$OMP PARALLEL add DEFAULT(NONE) then compile to produce a list of errors for the variables used in the parallel region that are not listed in your SHARED clause. Then for each variable in errror, determine if it should be SHARED or PRIVATE. Add a PRIVATE clause and specify those variables that need to be private. And for those in error that need to be shared, add them to the SHARED clause.

I second John Cambell's suggestion of investigating an improved algorithm. If it produces suitable results and is faster then the faster technique can be parallelized too.

I also suggest that you verify your shared verses private assumptions. Your:

    6 CONTINUE
      CALL FININT(CA1,170,CAP,D1)
      
! $OMP END DO     
!$OMP END PARALLEL 
      WRITE(6,2034)ro,CAP,psi

Appears to be writing to private varibale CAP (via FININT), then used outside the parallel region. It looks as if you haven't fully thought out how you are producing the results in parallel.

Jim Dempsey

www.quickthreadprogramming.com

For me, this is a very interesting post as it demonstrates some problems with teaching numerical computation.
I am not sure of the source of this problem, but the posted aim has been to improve the speed of computation.
This has been done without first confirming that the serial program is getting the correct answer. There are a number of issues with this calculation where the limited precision of 64-bit computation can affect the accuracy of the computation.

The function HF1 looses precision if abs(R) < 1.e-6, as sin(x) - x*cos(x)  looses precision for x < 1.e-6

      real*8 FUNCTION HF1(R)
      IMPLICIT REAL*8(A-H,O-Z)
       COMMON/FE/RKF1,RKF2
       REAL*8    rkf1,rkf2,X1, R
       X1=R*RKF1
       HF1 = 3.*(DSIN(X1)-X1*DCOS(X1))/X1/X1/X1
       RETURN
      END 

The computation of CAZ is:

      CAZ(IZ1)=(1./2.)*RO**2*WW(I12)*HI13*DCOS(teta1)*DCOS(teta2)*
      *(R12*R23)**2*(1-(1./4.)*(psi2**2*(H2F23**2+H2F12**2+H2F13**2)+
      +psi1**2*(H1F23**2+H1F12**2+H1F13**2)-
      -psi2**3*H2F13*H2F23*H2F12-psi1**3*H1F13*H1F23*H1F12))
!  Becomes
      CAZ(IZ1) = (1./2.)*RO**2
                * WW(I12)
                * HI13
                * DCOS(teta1)
                * DCOS(teta2)
                *(R12*R23)**2
                * (1-(1./4.)*( psi2**2 * (H2F23**2 + H2F12**2 + H2F13**2)
                             + psi1**2 * (H1F23**2 + H1F12**2 + H1F13**2)
                             - psi2**3 * H2F13*H2F23*H2F12
                             - psi1**3 * H1F13*H1F23*H1F12                ) )

where H1F.. and H2F.. are all functions of HF1 and HF2

My apologiesd to msabet s., but the real problem to be solved with this computation is not to make it faster, but to get the right answer, while using finite precision computation. Converting to real*16 is probably not the answer.
I wonder how the right answer can be obtained to enable verification ?

John

 

Zitat:

John Campbell schrieb:

For me, this is a very interesting post as it demonstrates some problems with teaching numerical computation.
I am not sure of the source of this problem, but the posted aim has been to improve the speed of computation.
This has been done without first confirming that the serial program is getting the correct answer. There are a number of issues with this calculation where the limited precision of 64-bit computation can affect the accuracy of the computation.

The function HF1 looses precision if abs(R) < 1.e-6, as sin(x) - x*cos(x)  looses precision for x < 1.e-6

      real*8 FUNCTION HF1(R)       IMPLICIT REAL*8(A-H,O-Z)       COMMON/FE/RKF1,RKF2       REAL*8    rkf1,rkf2,X1, R       X1=R*RKF1       HF1 = 3.*(DSIN(X1)-X1*DCOS(X1))/X1/X1/X1       RETURN       END

The computation of CAZ is:

      CAZ(IZ1)=(1./2.)*RO**2*WW(I12)*HI13*DCOS(teta1)*DCOS(teta2)*      *(R12*R23)**2*(1-(1./4.)*(psi2**2*(H2F23**2+H2F12**2+H2F13**2)+      +psi1**2*(H1F23**2+H1F12**2+H1F13**2)-      -psi2**3*H2F13*H2F23*H2F12-psi1**3*H1F13*H1F23*H1F12)) !  Becomes       CAZ(IZ1) = (1./2.)*RO**2                * WW(I12)                * HI13                * DCOS(teta1)                * DCOS(teta2)                *(R12*R23)**2                * (1-(1./4.)*( psi2**2 * (H2F23**2 + H2F12**2 + H2F13**2)                             + psi1**2 * (H1F23**2 + H1F12**2 + H1F13**2)                             - psi2**3 * H2F13*H2F23*H2F12                             - psi1**3 * H1F13*H1F23*H1F12                ) )

where H1F.. and H2F.. are all functions of HF1 and HF2

My apologiesd to msabet s., but the real problem to be solved with this computation is not to make it faster, but to get the right answer, while using finite precision computation. Converting to real*16 is probably not the answer.
I wonder how the right answer can be obtained to enable verification ?

John

 Dear John

Because of the physical condition on my problem and teh correct form of the plot obtained from this program (which you help me to understand and improve it) the current calculations are verified.

Thanks for your kindness

warm regards 

This is an update on progress I have made trying to assist msabet s. with improving the run time performance of the program he posted. I am including below a summary of the changes I have made, that appear to improve the performance. Significant performance improvement has been achieved on an i7 processor. I have included the test programs, which are generated with the included .bat files to generate them.
The final program more demonstrates the use of !$OMP, as I understand the calculations are still being refined.
I also included a program to test matrix multiplication (hello_mp5.f90) which tests some alternatives in matrix multiplication. This appears to be less effective for !$OMP, as I understand that although there are more processes available, there is a bottleneck in the memory accessing, which limits the effectiveness of parallel instructions.
There is still the outstanding issue of effective use of AVX instructions on the i7, which I will continue to investigate.

My testing has been carried out using Intel(R) Visual Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.1.5.344 Build 20120612. I have found it superior to Version 11. I am yet to test Version 13.

I have previously found it difficult to get !$OMP working effectively. These examples now demonstrate an effective result from the limited subset of !$OMP commands I have used. I would provide these examples as a starting point for using /Qopenmp, without delving into the omp library routines.

Thanks also to Jim who has provided valuable advice to get to this point.

The program provided by msabet s. is well suited to parallel processing. I have used this example to help me learn to better use !$OMP. I hope this example will be of assistance to others like me who are learning to use /Qopenmp with ifort.

John

Discussion of Improvement approach:

Changes that were done to improve performance:

  1. Rationalise the local array dimensions: The local array dimensions were reduced to suit the DO loop range. These were defined by means of a parameter variable.
  2. Update the function definition to use PURE and INTENT: Including both INTENT and PURE appeared to improve the performance of OpenMP. An INTERFACE statement was not included in the main program.
  3. Update the !$OMP to include DEFAULT(PRIVATE) and declare SHARED (arrays): This appeared to have a significant effect on performance, as errors in the list of shared arrays often stopped !$OMP from working.
  4. Break down the calculation of CAZ(IZ1) = … in the inner loop to move parts of the calculation out of the inner loops. This was also changed to CAZ(IZ3) = …: This approach helped in identifying the PRIVATE and shared variables. It was assumed that this rationalisation would improve performance, but was not tested. The optimising compiler might already do a lot of this.

I have also introduced some utilities to help with recording run times and identifying compiler options.

AUDIT.EXE is a simple program that records execution time of a program. This is run in a batch file to monitor the total run time of the test program and report a date/time stamp of the start and stop time. Audit is included in the batch file to run the test options on different computers/processors.

BUILD_LABEL.EXE is a program that takes the compiler options as a command argument and writes out two lines of Fortran code into an include file “build.ins”.

! build.ins example
      build_command = "ifort /Tf hello_mp5.f90 /free /O2 /QxAVX /Qopenmp"
      build_date    = "21-Aug-2013 16:20:39.377"

This is included in a subroutine “Start_Label (program_name)” which takes the program name, together with the compile date/time and compiler options and then reports this at run time to a log file. This is a useful feature, as it is often difficult to recall the compiler options used to build the .exe.  Build_Label is included in the batch file to generate the .exe file.

List of files in post.zip 

avx.bat                         batch file to generate .exe using QxAVX instructions

run_test.bat                  batch file to run test of 4 programs

sse.bat                         batch file to generate .exe using QxHost

x90.bat                         batch file to run a test using 3 different compilation options

                                     (/QxHost, /QxHost /Qopenmp, /QxAVX /Qopenmp)

audit.f90                       program to report elapsed time

build_label.f90              program to generate build.ins

e3h-4_6.f90                  revised program with !$OMP for loop 6

e3h-4_7.f90                  revised program with !$OMP for loop 7

hello_mp.f90                Test program for matrix multiplication using !$OMP (n=1000)

hello_mp5.f90               Test program for matrix multiplication using !$OMP (n=5000)

e3hh_original.for           Original posting of program

build.ins                       build include file

list.txt                           list of these files

pc_name.txt                 File that has the present PC description as the first line

w16.txt                         data file for e3hh_original.for and revisions

 

John Campbell   23 Aug 2013

ps: the description above is pasted from a word document. I love how this forum improves the layout !!

Anlagen: 

AnhangGröße
Herunterladen post.zip201.68 KB

Kommentar hinterlassen

Bitte anmelden, um einen Kommentar hinzuzufügen. Sie sind noch nicht Mitglied? Jetzt teilnehmen