A warning when i used omp language

A warning when i used omp language

Dear all:

Recently, i try to use openmp language through the Intel Visual Fortran.

Because i am a beginner of the parallel computing, i decided to make the do-loops faster by using multiprocessor.

I read a lot of information about the openmp, but had this warning when compiling.

warning #10247: explicit static allocation of locals specified, overriding OpenMP*'s implicit auto allocation

I really don't know where the problem is. 

If there are any suggestion, please help me.

Thank you very much

--------------------------------part of my program------------------------------------------------

!$omp parallel do default(shared) private(k,d_epsc,cgmci,Eta1)
       do k = 1,nsteel(i)       !the half of the section
            d_epsc=dd_defN(m)-zz_steel(k,i)*dd_defMy(m)+
     +      yy_steel(k,i)*dd_defMz(m)

            
          call FrontSteel(i,Fiber,Fbmat,d_epsc,cgmci,k,m,nsm,
     +           Eta1,time,repet,kfc,istep,nskip,secfail,ff_change)

!        computing axial force and bending moment of each fiber of the section 
        f11(k) = cgmci*aa_steel(k,i)
        fmy(k) = -cgmci*zz_steel(k,i)*aa_steel(k,i)
          fmz(k) = cgmci*yy_steel(k,i)*aa_steel(k,i)
 !        summation of axial force and bending moment of all fibers of the section 
        ss_sum11 = ss_sum11 + f11(k)
        ss_sum21 = ss_sum21 + fmy(k)
        ss_sum31 = ss_sum31 + fmz(k)

          f11(k)=0.
          fmy(k)=0.
        fmz(k)=0.
          d_epsc=0.

          end do      
      !$omp end parallel do

-----------------------------------------------------------------------------------------------------------------------

 

31 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Best Reply

I am guessing you are not using IMPLICIT NONE, if not, please add that to the top of your subroutine (then fix the declarations).

A second issue you have is the summations are going to have data races

!$omp parallel do default(shared) private(k,d_epsc,cgmci,Eta1) reduction(+:ss_sum11, ss_sum21, ss_sum31)

The reduction clause for operator +, provides for a private copy of the named variables, each zeroed, for use within the parallel region. On exit from the parallel region the operator + is performed in a thread safe manner to the shared variable in the scope outside the parallel region.

Jim Dempsey

I suspect you have explicitly set the /Qsave (Fortran->Data->Local Variable Storage = All Variables SAVE) compiler option.

The warning is output because /Oopenmp implies /Qauto (Local Variable Storage = Local Variables AUTOMATIC).

Dear Mr. Jim Dempsey

Thank you very much. Actually, i use IMPLICIT REAL*8 not the IMPLICIT NONE like you mentioned. I'll fix it immediately.

As about the data races condition, i am afraid that i didn't notice this problem. Once again, thank you very much.

I'll follow your suggestion to modify my program.

Pemg-Yu Chen 

Dear Mr.Mark Lewy

Thank you very much. Just like you mentioned, i used  the /Qsave (Fortran->Data->Local Variable Storage = All Variables SAVE) compiler option.

After change to the Local Variables AUTOMATIC , there is no any warning.

Thank you very much.

Pemg-Yu Chen

Dear all

I have another problem when i try to assign the number of threads.

I used "CALL omp_set_num_threads(4)" before the parallel region.

I also used the "USE omp_lib" at the head of my program.

There is an error message. 

error LNK2019: unresolved external symbol omp_set_num_threads referenced in function_Main

please give me some suggestion.

Thank you very much

Check if you have /Qopenmp (Fortran->Language->Process OpenMP Directives = Generate Parallel Code) set for the build configuration that produces LNK2019.  For example, if you set /Qopenmp for the Debug configuration only, you will also need to set it for the Release configuration (or use the All Configurations option for Configuration to set options that apply to all configurations).

Dear Mr. Mark Lewy

Yes, i already use the All Configurations option to set Fortran->Language->Process OpenMP Directives = Generate Parallel Code.

However, i still have the error message.

Pemg-Yu Chen

I think we would have to see what your project settings are.  Can you show us the command line settings for Fortran and Linker you are using?

Dear Mr. Mark Lewy

Here are the command line settings for fortran:

/nologo /debug:full /Od /Qsave /iface:cvf /module:"Debug/" /object:"Debug/" /Fd"Debug\vc100.pdb" /traceback /check:bounds /libs:static /threads /dbglibs /c

and the command line settings for linker:

/OUT:"Debug/20100619_n_15_3D_10_CYW6.exe" /INCREMENTAL /NOLOGO /MANIFEST /MANIFESTFILE:"C:\Users\SAM\Desktop\Parallel(origion)\Debug\20100619_n_15_3D_10_CYW6.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /DEBUG /PDB:"Debug/20100619_n_15_3D_10_CYW6.pdb" /STACK:100000000,100000000 kernel32.lib

Thank you very much

I can't see /Qopenmp in your Fortran settings.  It looks like you switched OpenMP off to remove the warning about the clash with /Qsave which means the OpenMP library doesn't get pulled in when you link, hence the unresolved symbol.

Dear Mr. Mark Lewy

I set again the Fortran->Language->Process OpenMP Directives = Generate Parallel Code, but it still has the error message.

Here are the command line settings for fortran:

/nologo /debug:full /Od /Qopenmp /Qsave /iface:cvf /module:"Debug/" /object:"Debug/" /Fd"Debug\vc100.pdb" /traceback /check:bounds /libs:static /threads /dbglibs /c

and the command line settings for linker:

/OUT:"Debug/20100619_n_15_3D_10_CYW6.exe" /INCREMENTAL /NOLOGO /MANIFEST /MANIFESTFILE:"C:\Users\SAM\Desktop\Parallel(origion)\Debug\20100619_n_15_3D_10_CYW6.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /DEBUG /PDB:"Debug/20100619_n_15_3D_10_CYW6.pdb" /STACK:100000000,100000000 kernel32.lib

Thank you very much

Is your call to omp_set_num_threads in the MAIN program, or in a subroutine called by the MAIN program?

The "use omp_lib" statement needs to be in the same program unit that is making the call to omp_set_num_threads.  It is not enough to put it in the MAIN program when a different subroutine is making the call.

Also, if you could show the exact error message that might be helpful too.   I was able to reproduce a failure when I removed the "use omp_lib" statement from my tiny program, but the error message was this one:

error LNK2019: unresolved external symbol _OMP_SET_NUM_THREADS@4 referenced in function _MAIN__
 

               --Lorri

 

Write a simple program, see if it works, then look at what is different or same.

program foo
  use omp_lib
  implicit none
!$omp parallel
  write(*,*) omp_get_thread_num()
!$omp end parallel
call omp_set_num_threads(4)
!$omp parallel
  write(*,*) omp_get_thread_num()
!$omp end parallel
end program foo

Jim Dempsey

Dear Miss Lorri Menard

Yes, I call the omp_set_num_threads in  a subroutine called by the MAIN program, and i only put "USE OMP_LIB" in the main program.

After following your suggestion to put the "USE OMP_LIB" in the subroutine, there is no any error message.

Thank you very much

Pemg-Yu Chen

Dear Mr.Jim Dempsey

Yes, i wrote the program that you suggested, it did work.

There is no any error message after following the suggestion of Miss Lorri Menard.

Thank you very much.

Pemg-Yu Chen

Dear all

Thank your suggestions. There is no any error message while compiling.

However, when i start to debug, there is something wrong.

"program exception - stack overflow"

Pemg-Yu Chen

I'll assume you resolved the reduction bug which Jim pointed out.

Growth of stack usage is a frequent aspect of parallelization, even when you set a reasonable value for OMP_NUM_THREADS.  By default, if HyperThreading is detected, the number of threads will be set to 2 per core, even though that is likely to be excessive (unless your only goal is to max out the meter graph).  Note Jim's advice to start with a reasonable value.

Requirement to set the stack limit (by /link /stack: option or by editbin) is not surprising.

If you are using 32-bit mode, OMP_STACK_SIZE is preset to 2MB (4MB in 64-bit mode).  That's the local stack per thread.  Evidently, there are more limitations in parallel scaling in 32-bit mode.

Dear all:

Thank for your suggestions. I tried to modify my program by following Mr.Jim and Mr.Tim Prince.

However, I still had some problem.

If i didn't define the Stack Size, it will show the error "Stack Overflow".

If I defined a big number as the stack size, it still can't run.

I can't explain the problem exactly, so i updated some picture and the subroutine.

Thank you very much for your suggestions.

Pemg-Yu Chen

 

 

Attachments: 

AttachmentSize
Downloadimage/jpeg 123.jpg484.12 KB
Downloadimage/jpeg 456.jpg425.41 KB
Downloadimage/jpeg 789.jpg417.87 KB
Downloadapplication/octet-stream parallel zone.f10.24 KB

I get a corrupted file for the source code.  Perhaps there are unusual character sets.

Dear Mr. Tim Prince

The code is just the subroutine of my program, i use the notepad to save the code.

I save the code again through the fortran.

Thank you very much.

Pemg-Yu Chen

Attachments: 

AttachmentSize
Downloadapplication/octet-stream Source1_0.for16.21 KB

I'm not certain this part of your source code helps to explain the points you asked about.

This does raise some concerns:

1) why did the compiler report OpenMP DEFINED LOOP WAS PARALLELIZED in spite of the syntax errors at that point?  If you made those arrays private, in spite of the errors, that would probably require increasing OMP_STACKSIZE as well as setting /link /stack:  to a suitable value.  Note that beginners often set OMP_STACKSIZE so large that at most 1 or 2 threads could run. 

As you apparently perform sum reductions, the reduction clause would be mandatory.  You probably need Inspector to have such errors pointed out automatically.

If you intended private arrays to be initialized to values inherited from outside the parallel region, you would need explicit copyin or firstprivate.  I don't think any syntax checker would show you that.

2) It looks difficult to debug combinations of IBM360 non-standard style  with Fortran 90 and OpenMP usage.  In particular, why are you using the 40 year old style of assumed size declarations  (and what ifort option would raise warnings?).

>>As about the data races condition, i am afraid that i didn't notice this problem.

With data races you won't notice the problem (program crash) other than for screwy results. Also note, some race conditions are often not detected as you develop the code, rather the race condition is not discovered until the code is in production.

An alternative to using reduction is by way of !$OMP ATOMIC or !$OMP CRITICAL (youNameItHere). When used in a loop, they are generally much slower than using the reduction clause. The reduction clause only has # threads number of atomic/critical operations, the other two have loop iteration count (possibly divided by simd width) number of atomic/critical operations. The other two are useful in situations where you want the other threads to be notified immediately (e.g. parallel search).

Jim Dempsey

Dear all

Thanks for your suggestions, i am facing another problem now.

Once i increased the stack size, it showed the message liked below.

[Frames below may be incorrect and/or missing, no symbols loaded for ntdll.dll]

     kernel32.dll!7561338a()  

     ntdll.dll!77b89f72()     

     ntdll.dll!77b89f45()     

 Without the !$omp languages, there was no problem.

How could i fix it?

Pemg-Yu Chen

 

When you get those error messages, look at the Call Stack. Usually this is shown in a tab at the bottom of the Visual Studio IDE. If you do not see it, click on: Debug, Windows, Call Stack

Look in the Call Stack, reading from bottom up, until you hit the first line with what looks like part of your program. Double Click on that and the source line should appear in the source code window.

Jim Dempsey

Dear Mr.Jim

Thanks for your suggestion, it seems like the problem is due to the call for a subroutine in the !$omp parallel region.

What may cause the message?

[Frames below may be incorrect and/or missing, no symbols loaded for ntdll.dll]

     kernel32.dll!7561338a()  

     ntdll.dll!77b89f72()     

     ntdll.dll!77b89f45()

Thank you very much.

Pemg-Yu Chen

Those refer to the call stack from the start of the thread..

The above is for a main thread, yours may root in the startup of an OpenMP thread pool thread. In the call stack, the bold are generally those of your application, the greyed are typically those used to get the thread going.

If you do not see any portion of your program at the top of the stack, then something occurred during thread startup. This could be something like having an undefined reference in a private, copyin, firstprivate, reduction, (others).

Jim Dempsey

Dear Mr Jim:

I declared my variables after i change the "implicit real" to "implicit none" following your suggestion at the beginning.

If i set the "Local Variable Storage" as "ALL Variables SAVE", the program can run, but the result was wrong.

At the same time, it will has the warning "explicit static allocation of locals specified, overriding OpenMP*'s implicit auto allocation".

If i set the "Local Variable Storage" as "LOCAL VARIABLE AUTOMATIC", the program can not run, and the error showed as before.

What is the meaning of the setting, which is wright, and how can i modify my program?

Thank you very much.

 

Pemg-Yu Chen

Procedures called in a parallel region must have local variables and arrays automatic (both scalars which will be automatic unless you set /Qsave, and arrays, which will not be automatic unless you set /QopenMP, /Qauto, or declare the procedure with RECURSIVE.  Of course, these could expose problems with undefined variables. It's advisable to check that your program runs correctly with /Qauto /Qsave- before attempting OpenMP.

Dear Mr. Chen

In making a quick look at your program you have:

ss_sum11 = ss_sum11 + f11(k)
ss_sum21 = ss_sum21 + fmy(k)
ss_sum31 = ss_sum31 + fmz(k)

this will present a race condition when multiple threads attempt to update the sum values at the same time.

To correct for this, you will need to use (add) the "reduction(+:ss_sum11, ss_sum21, ss_sum31)" clause to your !$omp statement.

*** you also have other summation variables not listed above, those will have to be added to the reduction clause too

I have not looked at subroutine FrontSteel. It too may have a conflicting use of variables. If it has SAVE'ed local variables to be carried from call to call, it too will be in error.

Jim Dempsey

If you're trying to parallelize a program by trial and error, the Intel Parallel Studio tools Advisor and Inspector ought to be helpful.

Leave a Comment

Please sign in to add a comment. Not a member? Join today