OpenMP problem with local variables

OpenMP problem with local variables

I have a code which looks like this (simplified):

              write(*,'(2(a,f8.2))') ' Min = ', Tmin, ' Max = ', Tmax

*dec$ if defined (_OPENMP_)
*$omp parallel if ( enableOpenMP .AND. omp_bc ) num_threads ( threads ) default ( shared )
*$omp&  firstprivate ( ..., Tmin, Tmax, ... )

*$omp do schedule(dynamic,3)
*dec$ end if
            do ij = 1,(i2-i1+1)*(j2-j1+1)
            
!                write(*,'(2(a,f8.2))') ' Min = ', Tmin, ' Max = ', Tmax

                 call apply_BC ( ..., Tmin, Tmax, ... )

Tmin and Tmax (both real*8) contain nonzero values before the parallel region is entered. If the commented line inside the parallel region is uncommented, Tmin and Tmax are the same as before the parallel region. However, letting the line commented out and calling the write command from the subroutine apply_BC causes that both Tmin and Tmax are suddenly 0 (generally a different value). When the program comes to the first line of my example code again, Tmin and Tmax are correct.

So, it seems entering the parallel region is causing some problems. It might be a bug in my code (quite big - 2300 lines), but I have no idea what I should focus on when trying to find the cause of the problem.

Parallel debug version and a version with enableOpenMP = .false. (no parallelization) work correctly.

34 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

From the looks of it you found an optimization problem. Can you ascertain what is going on from the dissassembly code?

www.quickthreadprogramming.com

The source code is confusing enough. You make us wonder if you mis-spelled _OPENMP or intentionally defined 2 macros both of which might have somewhat similar function to the standard one, and why use the dec$ form of conditional compilation in an apparently similar sense to the cpp style which is required to be available for OpenMP.

I am not able to work with disassembly code, but I will try to ask my colleagues to help me with it. I tried to disable optimizations for the file containing the above mentioned source code, but it did not help.

_OPENMP_ is not misspelled - I intentionally used it, because I wanted to have a code without any OpenMP directives. Maybe the name of the macro is confusing - the problem is that I did not know there is _OPENMP macro available - but I hope it does not affect anything. You are right that this is a duplicity - I will change it to _OPENMP to have the code clearer.

Have you tried running with bounds checking turned on?

I tried taking my Release configuration, changing all run-time checks to yes (/check:pointer /check:bounds /check:uninit /check:format /check:output_conversion /check:arg_temp_created) and rebuild the project. I got a compiler error "Fatal compilation error: Out of memory asking for 8200".
Is it OK to enable these checks in the Release configuration?
Do I need to enable Traceback Information? (I think this will just tell me the location in the code where any check found an error).

I am going to try the Debug configuration with all checks enabled, although the Debug version does not experience the problem described in my original post.

>>the write command from the subroutine apply_BC causes that both Tmin and Tmax are suddenly 0 (generally a different value). When the program comes to the first line of my example code again, Tmin and Tmax are correct.

By this do you mean the values of Tmin and Tmax as written by WRITE (or examined in Debugger at WRITE)are different than the values inside subroutine apply_BC(..., Tmin, Tmax,...)?

If so, then I would suspect that Tmin and Tmax are not declared with the same type. In one place they are likely REAL(8) and in the other they are REAL(4).

Try using /gen-interfaces /warn:interfaces.

Jim Dempsey

www.quickthreadprogramming.com

The Debug version works without any problems, even with /gen-interfaces and /warn:interfaces. I have all available checks enabled in the Debug version, but none of them is issuing any warning. This also means that Tmin and Tmax have the same value at the line with WRITE and inside the subroutine apply_BC.

In addition, I checked that Tmin and Tmax are always declared as real*8.

When I try Release version, I can see correct values at WRITE, but a parallel region is entered after it and the same WRITE statement in the subroutine apply_BC inside the parallel block shows different values (0).

I tried enabling various checks in the Release version (it might be nonsense), but I am not able to compile the project - the compiler is trying to allocate too much memory (I saw 2 GB in the task manager) and then aborts.

There is one commented line in my original post. After I uncomment it, Tmin and Tmax contain correct values, but I noticed that some other variables suddenly contain wrong values (0).

So, I continue searching for the possible cause of this problem. Up to now, thank you all for your suggestions and ideas.

Quoting - jirina
I tried taking my Release configuration, changing all run-time checks to yes (/check:pointer /check:bounds /check:uninit /check:format /check:output_conversion /check:arg_temp_created) and rebuild the project. I got a compiler error "Fatal compilation error: Out of memory asking for 8200".
Is it OK to enable these checks in the Release configuration?
Do I need to enable Traceback Information? (I think this will just tell me the location in the code where any check found an error).

I am going to try the Debug configuration with all checks enabled, although the Debug version does not experience the problem described in my original post.

Enabling the checks will hurt performance. Off course you can do it but I don't think it's advisable in the production/release code... I mean, when you reach that stage is because it has been proven that in regular operation that kind of error won't happen. Putting it in another way: I think it would be OK to enable them in the Release for debug porpuses and not for releasing the Release...

Ricardo Reis
'Non Serviam'
@ http://www.lasef.ist.utl.pt
@ http://www.radiozero.pt
@ http://rreis.tumblr.com
@ http://www.flickr.com/photos/rreis

can you post all the flags you are using for the Release version?

Ricardo Reis
'Non Serviam'
@ http://www.lasef.ist.utl.pt
@ http://www.radiozero.pt
@ http://rreis.tumblr.com
@ http://www.flickr.com/photos/rreis

Sure, I would not release the Release version with all the checks enabled. I just wanted to see whether any of them gives me a clue where the problem comes from.

Anyway, the whole project is compiled with following options:

/nologo /D_OPENMP_ /fixed /extend_source:132 /Qopenmp /fpscomp:general /warn:declarations /warn:unused /assume:byterecl /module:"Release" /object:"Release" /libs:static /threads /c /align:all /heap-arrays

and linked with following settings:

/OUT:"Releasename1.exe" /NOLOGO /DELAYLOAD:"name2.dll" /MANIFEST /MANIFESTFILE:"..." /SUBSYSTEM:CONSOLE /STACK:100000000 /IMPLIB:"name3.lib" delayimp.lib libguide.lib

>>There is one commented line in my original post. After I uncomment it, Tmin and Tmax contain correct values, but I noticed that some other variables suddenly contain wrong values (0).

This is usually an indication of a calling convention error where the stack pointer is not cleaned up properly after a call. These types of errors often occure in mixed language programs. Is your application a mixture of Fortran and something else? (e.g. C++)

Jim Dempsey

www.quickthreadprogramming.com

Quoting - jirina
Sure, I would not release the Release version with all the checks enabled. I just wanted to see whether any of them gives me a clue where the problem comes from.

Anyway, the whole project is compiled with following options:

/nologo /D_OPENMP_ /fixed /extend_source:132 /Qopenmp /fpscomp:general /warn:declarations /warn:unused /assume:byterecl /module:"Release" /object:"Release" /libs:static /threads /c /align:all /heap-arrays

and linked with following settings:

/OUT:"Releasename1.exe" /NOLOGO /DELAYLOAD:"name2.dll" /MANIFEST /MANIFESTFILE:"..." /SUBSYSTEM:CONSOLE /STACK:100000000 /IMPLIB:"name3.lib" delayimp.lib libguide.lib

Why are you requesting such an enormous stack? Could this be the reason your release build fails?

Quoting - gib

Why are you requesting such an enormous stack? Could this be the reason your release build fails?

My application is a CFD (Computational Fluid Dynamics) solver which typically uses about 250 MB of memory. When I tried to run it parallelly, it was crashing, so I gradually increased the Stack Reserve Size until the application stopped crashing.

Anyway, the Release build is compiled and linked without any problems if no run-time checks are enabled.

Quoting - jimdempseyatthecove

>>There is one commented line in my original post. After I uncomment it, Tmin and Tmax contain correct values, but I noticed that some other variables suddenly contain wrong values (0).

This is usually an indication of a calling convention error where the stack pointer is not cleaned up properly after a call. These types of errors often occure in mixed language programs. Is your application a mixture of Fortran and something else? (e.g. C++)

Jim Dempsey

My application is written in Fortran. I am using one (delay-loaded) DLL written in C++ which is used to:
- Write to Windows event log - This is disabled in all of my tests.
- Handle signals (Ctrl+C, etc.) - I know SIGNALQQ can be used for this, but I needed additional functionality (e.g. handle console window closing).

Anyway, I removed all usages of the DLL from my project, I removed the DLL from project settings (delay-loaded DLL) and I even tried decreasing the stack reserve size to 0 (see my other post). The problem still occurs.

Thank you that you keep giving me ideas.

See if the following works

!$OMP PARALLEL DO
DO I=1,COUNT
!$OMP SINGLE
your code here
!$OMP END SINGLE
END DO
!$OMP END PARALLEL DO

The above is an outline, work into your code.

If the above works, then you may have a problem with private/shared (likely an oversight or typographical error).

By moving the SINGLE/END SINGLE about you might be able to isolate the error.

Jim Dempsey

www.quickthreadprogramming.com

This is a good idea, but I unfortunately cannot use SINGLE like that. I am getting an error message "error #7917: The workshare construct SINGLE or SECTIONS is invalid in a PARALLELDO which must contain a single DO directive". My sample code looks like this:

*$omp parallel do default ( shared ) private ( ij, j ) reduction ( +: i )
          do ij = 1,100
*$omp single
            j = j+1
*$omp end single
            i = i+1
          end do
*$omp end parallel do

I read the documentation for SINGLE, but I did not find any reason why it should not work in my example.

OOPs my mistake
I ment to say

!$OMP CRITICAL
...
!$OMP END CRITICAL

lit one thread through at a time
If n threads run that way (one at a time through the critical section) then you can assume you have a shared variable problem.

Sorry about the faux pas

Jim

www.quickthreadprogramming.com

No problem with CRITICAL vs. SINGLE, it was my fault not to read the documentation thoroughly. CRITICAL compiles well.

Anyway, going back to my original code and adding critial, I have:

              write(*,'(2(a,f8.2))') ' Min = ', Tmin, ' Max = ', Tmax

*$omp parallel if ( enableOpenMP .AND. omp_bc ) num_threads ( threads ) default ( shared )
*$omp&  firstprivate ( ..., Tmin, Tmax, ... )

*$omp do schedule(dynamic,3)
            do ij = 1,(i2-i1+1)*(j2-j1+1)
*$omp CRITICAL ! added 2009-02-24
!                write(*,'(2(a,f8.2))') ' Min = ', Tmin, ' Max = ', Tmax

                 call apply_BC ( ..., Tmin, Tmax, ... )
                 ...
*$omp END CRITICAL
            end do
*$omp end do
*$omp end parallel

Even after putting the whole contents of the parallel do region into a critical section, the write statement inside apply_BC shows incorrect values of Tmin and Tmax.

Does this mean I should go through all variables and see which of them are (first)private and which of them are shared? Did you mean that there might be a problem with some of the shared variables?

PS: I tried the latest version of Intel Visual Fortran (11.0.072), but it did not solve the problem. At least, my code compiles much faster with the new version of IVF. :-)

Second attempt at writing this (damb edit window on this brain dead forum when you type Tab. Tab should indent while in edit box and not go off (out of the edit window) and do some other functions)

Place a break on the write statement.

At break, open a Dissassembly window.

What is being passed for Tmin and Tmax to the write?
What is being passed for Tmin and Tmax to apply_BC?

Jim Dempsey

www.quickthreadprogramming.com

I am sorry, I can't work with Disassembly. Anyway, I might be wrong, but I assume the Disassembly window is supposed to be used in Debug version. And the key thing is that the Debug version works well (Tmin and Tmax have correct values in both screen output after the write statement and in the Watch window when breakpoint is placed at locations suggested by you).

I tried to check IVF Help and this forum for more information about Disassembly, but everything I found was related to debugging. Nevertheless, am I missing something what Disassembly can tell me more than the Watch window?

Or is there any way of using Disassembly even in the Release version (which might need some adjustments of compiler and linker options)? If yes, I would try learning reading the disassembly code...

jirina,

You can run the debugger on release code. Compile the release code with debugger symbol generation enabledas the only difference in your options (rember to turn this off before you ship code to users).

Debugging a release code version is particularly difficult as the optimization process moves, changes and/or eliminates code.

Fortunately, you are not using the debugger in a manner that matters with respect to this debugging difficulty. You will mearly be using the debugger at one break point to look at the dissassembly code around the break point.

-------------- source -------------
! ASDF.F90
module inModule
real :: inModVar
end module inModule

program ASDF
use inModule

real :: inCommonVar
common /mycom/ inCommonVar



inModVar = 123.
inCommonVar =456.
onStackVar = 789.

call foo(1122.33)
end program ASDF

subroutine foo(ReferenceVar)
use inModule
real :: inCommonVar
common /mycom/ inCommonVar
real, automatic :: onStackVar
onStackVar = inModVar + inCommonVar
write(*,*) inModVar, inCommonVar, onStackVar, ReferenceVar
onStackVar = onStackVar + ReferenceVar
end subroutine foo
--------- end source --------------
---- dissassembly for write ----
---- Note, this is 32-bit code, 64-bit code will look different ----
---- My Annotations are  lines starting with >

> source write statement
write(*,*) inModVar, inCommonVar, onStackVar, ReferenceVar
0040101D  mov         dword ptr [ebp-34h],0 
> inModVar is "_INMODULE_mp_INMODVAR"
00401024  fld         dword ptr [_INMODULE_mp_INMODVAR (4DEC60h)] 
0040102A  fstp        dword ptr [ebp-10h] 
0040102D  lea         eax,[ebp-34h] 
00401030  mov         dword ptr [esp],eax 
00401033  mov         dword ptr [esp+4],0FFFFFFFFh 
0040103B  mov         dword ptr [esp+8],384FF00h 
00401043  mov         dword ptr [esp+0Ch],offset ___xt_z+20h (4B525Ch) 
0040104B  lea         eax,[ebp-10h] 
0040104E  mov         dword ptr [esp+10h],eax 
00401052  call        _for_write_seq_lis (401140h) 
00401057  add         esp,14h 
> inCommonVar is "_MYCOM"
> but when not 1st common variable you would see "_MYCOM+someNumberHere"
0040105A  fld         dword ptr [_MYCOM (4DEC50h)] 
00401060  fstp        dword ptr [ebp-0Ch] 
00401063  add         esp,0FFFFFFF4h 
00401066  lea         eax,[ebp-34h] 
00401069  mov         dword ptr [esp],eax 
0040106C  mov         dword ptr [esp+4],offset ___xt_z+28h (4B5264h) 
00401074  lea         eax,[ebp-0Ch] 
00401077  mov         dword ptr [esp+8],eax 
0040107B  call        _for_write_seq_lis_xmit (402950h) 
00401080  add         esp,0Ch 
> onStackVar are as-is "ONSTACKVAR"
00401083  fld         dword ptr [ONSTACKVAR] 
00401086  fstp        dword ptr [ebp-8] 
00401089  add         esp,0FFFFFFF4h 
0040108C  lea         eax,[ebp-34h] 
0040108F  mov         dword ptr [esp],eax 
00401092  mov         dword ptr [esp+4],offset ___xt_z+30h (4B526Ch) 
0040109A  lea         eax,[ebp-8] 
0040109D  mov         dword ptr [esp+8],eax 
004010A1  call        _for_write_seq_lis_xmit (402950h) 
004010A6  add         esp,0Ch 
> dummy argumnets are as-is "REFERENCEVAR"
004010A9  mov         eax,dword ptr [REFERENCEVAR] 
004010AC  fld         dword ptr [eax] 
004010AE  fstp        dword ptr [ebp-4] 
004010B1  add         esp,0FFFFFFF4h 
004010B4  lea         eax,[ebp-34h] 
004010B7  mov         dword ptr [esp],eax 
004010BA  mov         dword ptr [esp+4],offset ___xt_z+38h (4B5274h) 
004010C2  lea         eax,[ebp-4] 
004010C5  mov         dword ptr [esp+8],eax 
004010C9  call        _for_write_seq_lis_xmit (402950h) 
004010CE  add         esp,0Ch 

The 64-bit code will look quite different, but the symbolic information for your variables willhave descernable patters.

There are two things for you to check

1) Are the symbolic names for Tmin and Tmax the same for your write statement as for your call statement

2) Prior to beginning of code for write statement record the value of ESP (or RSP). this is your stack pointer. As you see from the above code, the single write statement was broken up into multiplecalls to _for_write_seq_lis_xmit following this is a stack fix-up "add esp,0Ch" for 64-bit the code will be a bit different, I am sure you will have no problem in figuring it out. After the stack fixup following the call to write recheck ESP or RSP - it should be the value you recorded prior to the call.

3) Step 2) was more of an exercize, as I am sure that the WRITE will not messup the stack. Perform the step 2) technique on the subroutine call causing the error.

Things to note

a) on return from call, and after stack fixup is ESP (or RSP) correct?
b) does the text in the dissassembly look the same? The re-dissassembled code showing different variables or expressions relative to resisters ",[ebp-34h]" in place of what used to hold symbolic names. If the names have changed, then this might indicate the frame pointer was not restored properly. This is EBP in 32-bit and RBP in 64-bit.

Also note, if the problem can occure at the code your are looking at, it may also have occured _prior_ to the code you are looking at. An indication of that is the dissassembled symbolic information does not align with the expression of the source statement (WRITE or CALL as the case may be.

This is to say, the "Things to note", b) following step 3) should be noted prior to step 1) above.

I couldn't tell you what to look for at step 1) before walking you through to step 3) b)

Good luck hunting for the bug.

Jim Dempsey

www.quickthreadprogramming.com

Jim,

I am grateful for everything you are doing to help me. Your detailed explanation should be OK for me to understand what to do; however, I am not able to do the first step. I have my Release version, I enabled debug information (/debug:full) and rebuilt the project. When I try to start debugging, I get the error message "Debugging information cannot be found or does not match. Binary was not built with debug information". When I continue, the execution does not stop at breakpoint which is obviously an expected consequence of the error message.

I checked the documentation and it seems that it should be possible to compile an application with both /debug:full and /O2 enabled. Or is there another compiler option which I don't see and which needs to be changed?

I am sorry for keeping asking (probably) stupid questions.

The linker must be told to keep the debug information - sorry I didn't mention this.

You can also select your Debug configuration and then in the project(s) select the optimization to maximum speed (or whatever). Toggle-ing the debug symbols on/off on Release configuration will be better as it will keep all the optimization switches the same. e.g. you may be using /O3 in most files but use /O2, /O1, /O0, etc.. in other files. Keeping optimization switches identical is paramount in trying to track down intermittent problems.

The code you have shown, should not be exhibiting the problems described, _provided_ you haven't made a programming error. The preponderance of problems are embarrassingly programmer errors as opposed to compiler errors, but their are the occasional compiler errors.

Jim Dempsey

www.quickthreadprogramming.com

Jim,

I followed your advice and I have following conclusions from my tests:

1. The symbolic names for Tmin and Tmax are correct in the write statement before the parallel region. However, they are wrong in case of the subroutine call. Instead of the symbolic names, I see this in the disassembly (RXBC is another real*8 local variable which seems to be OK):

007B7BC4  lea         eax,[RXBC] 
007B7BCA  mov         dword ptr [esp+94h],eax 
007B7BD1  lea         eax,[ebp-0DFCh] 
007B7BD7  mov         dword ptr [esp+98h],eax 
007B7BDE  lea         eax,[ebp-0DF4h] 
007B7BE4  mov         dword ptr [esp+9Ch],eax

Also, when I am debugging and place the mouse pointer over a local variable I can see its value. This does not work with Tmin and Tmax.

2. The write statement before the parallel region does not messup the stack - the value of ESP is the same before and after the write statement.

3. However, ESP value is different before and after the subroutine call.

You mentioned that the problem might have occurred prior to the location I am looking at. I am going to check this, but I am afraid this might be difficult because of chenges introduced to the optimized code. It looks strange to see the subroutine call in the disassembly to be "interrupted" by a code preceding the call (e.g. parts of the parallel region definition or some lines from between the parallel region definition and the subroutine call). Is this normal or an indication of a bug in my code?

In addition, I manually and several times checked whether the number and type of subroutine arguments is the same in the declaration and in the call - this seems to be OK.

Thank you for your patient help.

jirina,

Try following changes note comments

              write(*,'(2(a,f8.2))') ' Min = ', Tmin, ' Max = ', Tmax   
  
! use "!" instead of "*"
!dec$ if defined (_OPENMP_) 
! place "&" at end of next line for continuation, remove "&" from line following next line  
!$omp parallel if ( enableOpenMP .AND. omp_bc ) num_threads ( threads ) default ( shared )   &
!$omp  firstprivate ( ..., Tmin, Tmax, ... )   
  
!$omp do schedule(dynamic,3)   
!dec$ end if  
            do ij = 1,(i2-i1+1)*(j2-j1+1)   
               
!                write(*,'(2(a,f8.2))') ' Min = ', Tmin, ' Max = ', Tmax   
  
                 call apply_BC ( ..., Tmin, Tmax, ... )  

Jim

www.quickthreadprogramming.com

I am using the fixed form and OpenMP documentation says:

# !$OMP C$OMP *$OMP are accepted sentinels and must start in column 1
# All Fortran fixed form rules for line length, white space, continuation and comment columns apply for the entire directive line
# Initial directive lines must have a space/zero in column 6.
# Continuation lines must have a non-space/zero in column 6.

I replaced *dec$ by !dec$, *$omp by !$omp, but I can't do more - putting & at the end of first line and removing it from column 6 of the next line results in compiler error message
"error #5082: Syntax error, found '&' when expecting one of: PRIVATE FIRSTPRIVATE REDUCTION COPYIN NUM_THREADS SHARED IF DEFAULT , ..."

Replacing * by ! did not help, the problem/bug is still there.

*$omp parallel if ( enableOpenMP .AND. omp_bc ) num_threads ( threads ) default ( shared )
123456789111111111122222222223333333333444444444455555555556666666666777777777788888888889
         012345678901234567890123456789012345678901234567890123456789012345678901234567890

Fixed form ends at column 72 unless you have extended source selected (then at col 132)

Using conditional compilation (so as to not muck up anyting)

Try a quick test by removing "if ( enableOpenMP .AND. omp_bc ) num_threads ( threads )" and concatinating the omp clause from the following line. Check to assure the resultant line is less than 73 chars.

If this fixes the problem then there is a syntax problem with the omp statements.

Jim Dempsey

www.quickthreadprogramming.com

Quoting - jimdempseyatthecove

*$omp parallel if ( enableOpenMP .AND. omp_bc ) num_threads ( threads ) default ( shared )
123456789111111111122222222223333333333444444444455555555556666666666777777777788888888889
         012345678901234567890123456789012345678901234567890123456789012345678901234567890

Fixed form ends at column 72 unless you have extended source selected (then at col 132)

If the OP is using the fixed form width of 72 then he is very unlucky that column 72 happens to contain a space, implying that the 'default ( shared )' clause was chopped off without generating a compile error. Another reason to use free form ...

Quoting - jimdempseyatthecove

*$omp parallel if ( enableOpenMP .AND. omp_bc ) num_threads ( threads ) default ( shared )
123456789111111111122222222223333333333444444444455555555556666666666777777777788888888889
         012345678901234567890123456789012345678901234567890123456789012345678901234567890

Fixed form ends at column 72 unless you have extended source selected (then at col 132)

Using conditional compilation (so as to not muck up anyting)

Try a quick test by removing "if ( enableOpenMP .AND. omp_bc ) num_threads ( threads )" and concatinating the omp clause from the following line. Check to assure the resultant line is less than 73 chars.

If this fixes the problem then there is a syntax problem with the omp statements.

Jim Dempsey

I am using fixed form, but extended to 132 characters per line (/fixed /extend_source:132).

I cannot have just one line with !$OMP, because there is a big list of variables in FIRSTPRIVATE (about 55 variables).

Anyway, I changed this particular .for file from the fixed form to the free form, but unfortunately, the problem with Tmin and Tmax is still the same. This might mean that continuation lines of the omp clause are not causing the problem.

Best Reply

Here is a potential work around

Save the FIRTSTPRIVATE clauses as comments in the program. Place above the !$OMP PARALLEL...
Add LOGICAL :: InitOnce as subroutine local variable

In front of !$OMP PARALLEL...

Add InitOnce = .true.

Replace what used to have been the FIRSTPRIVATE clause with

FIRTSTPRIVATE(InitOnce)

Inside the body of the loop, at top of loop,add

if(InitOnce) then
InitOnce = .false.
localTmin = Tmin
localTmax = Tmax
...
endif
call foo(..., localTmin, localTmax, ...)

You can conditionalize this code if you wish.

Not clean but it should work

Jim Dempsey

www.quickthreadprogramming.com

I understand your point, but I think it won't work - call foo is in a parallel region in my case and it uses many variables which (I believe) must be defined as private or firstprivate.

Anyway, I updated my code from the post starting this discussion to read:

  write(*,'(2(a,f8.2))') ' Min = ', Tmin, ' Max = ', Tmax

  initOnce = .true. ! NEW

!omp parallel if ( enableOpenMP .AND. omp_bc ) num_threads ( threads ) default ( shared )  
!$omp&  firstprivate ( ..., Tmin, Tmax, ..., initOnce )
!$omp&  private ( localTmin, localTmax ) ! NEW

!$omp do schedule(dynamic,3)
  do ij = 1,(i2-i1+1)*(j2-j1+1)
    if ( initOnce ) then ! NEW
      initOnce = .false.
      localTmin = Tmin
      localTmax = Tmax
    endif
    write(*,'(2(a,f8.2))') ' Min = ', Tmin, ' Max = ', Tmax  
    call apply_BC ( ..., localTmin, localTmax, ... ) ! NEW instead of ( ..., Tmin, Tmax, ... )  

And this change helped! I can see correct values of Tmin and Tmax everywhere, even inside the parallel region and inside the subroutine apply_BC.

I know that this solution is not so clean, but I am happy it helped. Thank you very much, Jim, for your effort.

Anyway, I was trying to work with my code to find an eventual bug, but I did not find anything. There are 3 parts of the source code which are basically the same (each part corresponds to one coordinate direction X, Y, Z) - I checked it several times today. Still, two of them worked well in the parallel model and just one became a nightmare for me and probably also for you and other people trying to help me.

I am considering submitting this to Intel as a possible compiler bug, but I am afraid to do so, because the source code is not very nice (originally written in Fortran 77 by a non-programmer) and because there might be a bug which I don't see even though I have been trying to find it for more than one week.

Jirina,

I consider myself an experienced programmer.

Most of the times (~99%) when I off handedly think "this has got to be a compiler bug", I will look at the code again and again without seeing the error. However, with luck my error is found before I give up and submit it to Premier Support. In almost all the cases, the found error is the result of lax programming or stupid mistake on my part. This goes with the business - so I am used to it by now (40 years of programming).

Also,

The problem with the Tmin and Tmax may also be a problem with (some of) the remaining FIRSTPRIVATE variables. Until you pin down what is causing the error with Tmin an Tmax I suggest you consider making localXXX's out of tall the FIRSTPRIVATES. Additionally make it so you can conditionally compile either way, and insert some ASSERT sanity checkes. Doing this will permit you to catch additional errors now, as well as try out the next release(s) of the compiler later.

Glad this gave you a work-a-round so you can put this behind you and get on about your business.

Jim Dempsey

www.quickthreadprogramming.com

I completely agree with you when it comes to bugs. I am not a real (and experienced) programmer, so it is 99% sure that I have a bug in the code. I just cannot find it.

I will try to use local versions of all FIRSTPRIVATE variables and make the code conditionally compiled to see what happens when the new compiler version is out.

I need to say thank you once more for your kind help.

Leave a Comment

Please sign in to add a comment. Not a member? Join today