OpenMP + Optimization (/O2 or /O3) causes floating overflow

OpenMP + Optimization (/O2 or /O3) causes floating overflow

Hi,

I have a strange (to me) problem. 
In a big project, with OpenMP enabled by default.
In this project, I have 2 modules: A and B, where module A uses module B.

Module B do not have any OpenMP directive, nor Auto-parallelism is enabled.
Inside Module A, the subroutines that calls Module B also do not have any OpenMP directive, nor do have any of the calling subroutines.

When I compile the code, and disable OpenMP, OR disable Optimizations, OR disable both, only for the ModuleB, everything goes fine.
If I compile the ModuleB with both Optimization (/O2 or /O3) and OpenMP, the run craches after sometime with a floatpoint overflow.

Could you guide me to what kind of tests I should do in order to find the problem?

I'm using the latest Intel fortran.

Best regards,
Eduardo

 

24 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Does the code in ModuleB perform a convergence? Optimization may cause different instruction sequences that affect rounding.

A second difference is compiling with OpenMP attributes the subroutines and functions as "recursive". This may move some variables to stack, as opposed to some of those variables implicitly being located as SAVE. If you omit variable initialization, or rely on implicit SAVE'dness, then you will have potential problems.

Jim Dempsey

www.quickthreadprogramming.com

Hello Jim,

Thanks for the fast reply.
 

Nope. The module B is not performing convergence. It just uses values form other parts of the code to populate a 3D matrix.

I think there is no recursive subroutines in the code, and the recursivity is disable in the properties.
And if there is some variables implicitly being SAVE, this would be a bug, in this project. We avoid the use of SAVE. I'll check this anyway.

I think I'll not be able to scape the old "print" debug method... lol

Eduardo

Hi Again,

I made a series of tests and I think that my problem must be related with a compiler bug when using OpenMP and optimizing the code.

The folowing code causes an error:

    subroutine Compute_Multiparameters

        !Local-----------------------------------------------------------------
        integer                                 :: i, j, k

        !Begin-----------------------------------------------------------------

if3D:   if (Me%Is3D) then

             if((Me%ExternalVar%PropertyID == MacroAlgae_) .OR. (Me%ExternalVar%PropertyID == SeagrassesLeaves_) ) then  

                do k = Me%WorkSize%KLB, Me%WorkSize%KUB
                do j = Me%WorkSize%JLB, Me%WorkSize%JUB
                do i = Me%WorkSize%ILB, Me%WorkSize%IUB

                    if (Me%ExternalVar%WaterPoints3D(i, j, k) == WaterPoint) then 

                        Me%ShortWave%ExtinctionCoefField3D(i,j,k) =                           &
                                            Me%ShortWave%ExtinctionCoefField3D(i,j,k)       + &
                                            Me%ExternalVar%ExtinctionParameter              * &
                                           (Me%ExternalVar%Concentration3D(i, j, k)         * &
                                            Me%ExternalVar%ProducerOccupation(i, j, k)    * Me%UnitsCoef)
                    end if

                enddo
                enddo
                enddo

            else 
                
                do k = Me%WorkSize%KLB, Me%WorkSize%KUB
                do j = Me%WorkSize%JLB, Me%WorkSize%JUB
                do i = Me%WorkSize%ILB, Me%WorkSize%IUB

                    if (Me%ExternalVar%WaterPoints3D(i, j, k) == WaterPoint) then 

                        Me%ShortWave%ExtinctionCoefField3D(i,j,k) =                     &
                                            Me%ShortWave%ExtinctionCoefField3D(i,j,k) + &
                                            Me%ExternalVar%ExtinctionParameter        * &
                                           (Me%ExternalVar%Concentration3D(i, j, k)   * Me%UnitsCoef)
                                      
                    end if

                enddo
                enddo
                enddo

            end if 

        else if3D

            do i = Me%Size1D%ILB, Me%Size1D%IUB

                if (Me%ExternalVar%RiverPoints1D(i) == WaterPoint) then 
                
                    Me%ShortWave%ExtinctionCoefField1D(i) =                         &
                                        Me%ShortWave%ExtinctionCoefField1D(i)     + &
                                        Me%ExternalVar%ExtinctionParameter        * &
                                        (Me%ExternalVar%Concentration1D(i)        * Me%UnitsCoef)
                end if

            enddo

        endif if3D

    end subroutine Compute_Multiparameters

While the next code works:

  

    subroutine Compute_Multiparameters

        !Local-----------------------------------------------------------------
        integer                                 :: i, j, k
        real                                    :: a

        !Begin-----------------------------------------------------------------

if3D:   if (Me%Is3D) then

             if((Me%ExternalVar%PropertyID == MacroAlgae_) .OR. (Me%ExternalVar%PropertyID == SeagrassesLeaves_) ) then  

                do k = Me%WorkSize%KLB, Me%WorkSize%KUB
                do j = Me%WorkSize%JLB, Me%WorkSize%JUB
                do i = Me%WorkSize%ILB, Me%WorkSize%IUB

                    if (Me%ExternalVar%WaterPoints3D(i, j, k) == WaterPoint) then 

                        Me%ShortWave%ExtinctionCoefField3D(i,j,k) =                           &
                                            Me%ShortWave%ExtinctionCoefField3D(i,j,k)       + &
                                            Me%ExternalVar%ExtinctionParameter              * &
                                           (Me%ExternalVar%Concentration3D(i, j, k)         * &
                                            Me%ExternalVar%ProducerOccupation(i, j, k)    * Me%UnitsCoef)
                    end if

                enddo
                enddo
                enddo

            else 
                
                do k = Me%WorkSize%KLB, Me%WorkSize%KUB
                do j = Me%WorkSize%JLB, Me%WorkSize%JUB
                do i = Me%WorkSize%ILB, Me%WorkSize%IUB

                    if (Me%ExternalVar%WaterPoints3D(i, j, k) == WaterPoint) then 

                        Me%ShortWave%ExtinctionCoefField3D(i,j,k) =                     &
                                            Me%ShortWave%ExtinctionCoefField3D(i,j,k) + &
                                            Me%ExternalVar%ExtinctionParameter        * &
                                           (Me%ExternalVar%Concentration3D(i, j, k)   * Me%UnitsCoef)
                                      
                        a = Me%ShortWave%ExtinctionCoefField3D(i,j,k)     
                        if ((a > 1.0) .or. (a < 0.0)) then
                            print *, 'Error'
                            print *, Me%ShortWave%ExtinctionCoefField3D(i,j,k)
                            print *, Me%ExternalVar%ExtinctionParameter 
                            print *, Me%ExternalVar%Concentration3D(i, j, k)
                            print *, Me%UnitsCoef
                            stop
                        endif
                    end if

                enddo
                enddo
                enddo

            end if 

        else if3D

            do i = Me%Size1D%ILB, Me%Size1D%IUB

                if (Me%ExternalVar%RiverPoints1D(i) == WaterPoint) then 
                
                    Me%ShortWave%ExtinctionCoefField1D(i) =                         &
                                        Me%ShortWave%ExtinctionCoefField1D(i)     + &
                                        Me%ExternalVar%ExtinctionParameter        * &
                                        (Me%ExternalVar%Concentration1D(i)        * Me%UnitsCoef)
                end if

            enddo

        endif if3D

    end subroutine Compute_Multiparameters

This shows to me that the optimizer is doing something wrong. When I include the variable a and use it inside this sibrotuine, the optimizer is unable to do what it was doing and the code works...

This isn't the first time that adding a instruction solves an error...
I had othe case like this, sometime ago, where putting a "write" instruction would make the code work...

Any ideas?

 

Eduardo

Eduardo,

A funny thing happened on the way to the forum....

Fortran does not have an attribute REENTRANT. Multiple threads concurrently calling the same routine is a reentrancy situation. Due to the requirement of having the routines (possibly) being called reentrantly in OpenMP compiled programs .AND. the lack of REENTRANT attribute the attribute RECURSIVE is implicitly applied (whether you want it or not). RECURSIVE and REENTRANT routines have a requirement that barring explicit attributes, the declared local variables are to be located on the stack. When compiled without OpenMP and without attribute RECURSIVE... the compiler is free to make the variables stack located .OR. in a private (static) global location. Usually (in past experience)scalars end up on stack but arrays (REAL :: VEC(3), MAT(3,3)) were made implicitly SAVE.

Due to implicitly SAVE on non-OpenMP/non-RECURSIVE routines, and when alternate paths are traversed in the routine from call to call (IF... THEN... ELSE), it is quite possible that a first time path happens to initialize a variable that is bypassed for initialization on a subsequent pass.

Compiling as OpenMP would relocate these arrays on the stack, and then depending on code between calls, the code may trash or not trash the memory locations to be used on subsequent call.

The compiler has a runtime check option to test for uninitialized variable usage. Compile for OpenMP and enable this option to see what happens. (remove the option afterwards).

Jim Dempsey

www.quickthreadprogramming.com

Hello Jim,

I aways wanted to know what happens when a routine is called multiple times in parallel.
So, even if a routine is never called inside a OpenMP parallel region, when compiling with OpenMP enabled, the compiler will make all routines RECURSIVE? 

Regarding the implicity SAVE, this would impact only the arrays declared inside the routine, no?

The code of the routine that seems to be causing the error is this:

    subroutine Compute_Multiparameters

        !Local-----------------------------------------------------------------
        integer                                 :: i, j, k

        !Begin-----------------------------------------------------------------

if3D:   if (Me%Is3D) then

           !never enters here

            else 
                
                do k = Me%WorkSize%KLB, Me%WorkSize%KUB
                do j = Me%WorkSize%JLB, Me%WorkSize%JUB
                do i = Me%WorkSize%ILB, Me%WorkSize%IUB

                    if (Me%ExternalVar%WaterPoints3D(i, j, k) == WaterPoint) then 

                        Me%ShortWave%ExtinctionCoefField3D(i,j,k) =                     &
                                            Me%ShortWave%ExtinctionCoefField3D(i,j,k) + &
                                            Me%ExternalVar%ExtinctionParameter        * &
                                           (Me%ExternalVar%Concentration3D(i, j, k)   * Me%UnitsCoef)
                                      
                    end if

                enddo
                enddo
                enddo

            end if 

        else if3D

           !never enters here

        endif if3D

    end subroutine Compute_Multiparameters

Me is a module variable (a Type).
WaterPoint3D is an integer matrix (WaterPoint is a parameter wi 1).
ExtinctionCoefField3D ia a real matrix.
ExtinctionParameter is a real.
Concentration3D is a real matrix.
UnitsCoef is a real.

When this subroutine is called, all the arrays and scalars have theirs values defined.
The ExtinctionCoefField3D is first initialized with 0, and is actualized in other routines also, all of them similar to this one, than the P = P + K form. 

 if I change the code to something like below, the error disappears.

    subroutine Compute_Multiparameters

        !Local-----------------------------------------------------------------
        integer                                 :: i, j, k
        real                                    :: a

        !Begin-----------------------------------------------------------------

if3D:   if (Me%Is3D) then

           !never enters here

            else 
                
                do k = Me%WorkSize%KLB, Me%WorkSize%KUB
                do j = Me%WorkSize%JLB, Me%WorkSize%JUB
                do i = Me%WorkSize%ILB, Me%WorkSize%IUB

                    if (Me%ExternalVar%WaterPoints3D(i, j, k) == WaterPoint) then 

                        Me%ShortWave%ExtinctionCoefField3D(i,j,k) =                     &
                                            Me%ShortWave%ExtinctionCoefField3D(i,j,k) + &
                                            Me%ExternalVar%ExtinctionParameter        * &
                                           (Me%ExternalVar%Concentration3D(i, j, k)   * Me%UnitsCoef)

                        a = Me%ShortWave%ExternalVar%ExtinctionCoefField3D(i, j, k)
                        print *, a
                    end if

                enddo
                enddo
                enddo

            end if 

        else if3D

           !never enters here

        endif if3D

    end subroutine Compute_Multiparameters

When I try to debug the first version, it tells me that all the variables inside the subroutine are register variables, so, for sure that is something related with optimization.

One of the first things I did was to run with all the runtime checks enabled. Nothing appears. 

Any idea?
I confess I didn't understand very well the last part about the code between calls trashing the memory locations where the arrays are allocated...

Thanks, 
Eduardo

/Qopenmp sets among other things /Qauto, which is required for reentrancy, and RECURSIVE would do the same.  This puts local arrays on the stack; ifort's default is to set local arrays as if they were SAVE.  The /Qsave option (which is not on by default) would have a similar effect on local scalars.  Any local variable which gets SAVE one way or another would constitute a race condition if multiple threads try to modify it.

You would also have a race condition if multiple threads are accessing shared module variables, with one or more of them changing their contents.  RECURSIVE declaration or compile options won't change that.  With a shared array under data parallelism, you want to have distinct sections of the array updated by the various threads.

The Advisor application is meant to assist in parallelizing a working serial code, and Inspector in finding data races in parallel code.  Otherwise, it's pretty much accidental if serial run-time checks show up any parallelization problems.

Eduardo,

Is your program structure, effectively:

!$omp parallel do
   do i = 1,n
     call Compute_Multiparameters
   end do
!$omp end parallel do

or:

!$omp parallel do
                do k = Me%WorkSize%KLB, Me%WorkSize%KUB
                do j = Me%WorkSize%JLB, Me%WorkSize%JUB
                do i = Me%WorkSize%ILB, Me%WorkSize%IUB

                    if (Me%ExternalVar%WaterPoints3D(i, j, k) == WaterPoint) then

                        Me%ShortWave%ExtinctionCoefField3D(i,j,k) =                     &
                                            Me%ShortWave%ExtinctionCoefField3D(i,j,k) + &
                                            Me%ExternalVar%ExtinctionParameter        * &
                                           (Me%ExternalVar%Concentration3D(i, j, k)   * Me%UnitsCoef)

                        a = Me%ShortWave%ExternalVar%ExtinctionCoefField3D(i, j, k)
                        print *, a
                    end if

                enddo
                enddo
                enddo
!$omp end parallel do

If it is the former, then you can have multiple threads calculating the same values:
   Me%ShortWave%ExtinctionCoefField3D(i,j,k) = Me%ShortWave%ExtinctionCoefField3D(i,j,k) + ...
This will cause problems if multiple threads are updating the same memory values.

If it is the latter, you might have more success by explicitly declaring shared and private variables

John

Hello Tim,

Thanks for the explanation. I have yet to learn how to use the inspector.

In our code, we avoid the use of the SAVE attribute, don't initialize the local variables at the same time we declare it (avoid implicity SAVE) nor the /QSave options is used, so all local scalars will not have its value stored between calls and they mus be initialized before used.

Anything different of this would be a "bug".

The OpenMP directives are implemented only for looping and all local variables are declared as private, including the looping indexes. A great care was taken when implementing the OpenMP looping so to avoid Data race conditions, and usually only arrays on the form (i,j,k) are changed, making each thread working on a separated array section.

But in the code I showed, there isn't any OpenMP looping, nor the subroutine is called from inside openmp looping or recursively.

With only Optimization or only OpenMP enabled, the code works. When both are active, the run fails, but the "wrong" results are always the same, so, I don't think this is a data race conditions, as it would cause different results to appear. I tested the code at least a hundred times as of now.

So, to me, is something related to the way the optimization is being done, when using OpenMP, because while this array, ExtictionCoefField3D, in this part of the code, is not used under OpenMP conditions, it is used later in the code. Also, adding the "print" instruction, for sure changes something in the way this routine is optimized, because the problem goes away.

This matrix is declared inside a type, and at the start of the execution is allocated.

    type       T_Wave
        real, pointer, dimension(:)                 :: ExtinctionCoefField1D
        real, pointer, dimension(:,:,:)             :: ExtinctionCoefField3D
        integer                                     :: ExtinctionType       = FillValueInt
        real                                        :: ExtinctionCoef       = FillValueReal
        real                                        :: ExtCoefWater         = FillValueReal
        real                                        :: Percentage           = FillValueReal               
    end type T_Wave

Also, in the module where it is computed, no OpenMP directives exist. Yet, removing the OpenMP from this module also solves the problem, so, Is something related not only with Optimization, but also with the way subroutines under OpenMP, as was said, that becomes implicitly recursive to allow the possibility of reentrance, are optimized.

How is possible that a simple "print" put inside the looping can solve the problem?

Maybe the problem is not here, but somewhere else?

Cheers,
Eduardo
 

Eduardo,

From the sketch code, in particular inserting a print removes bug, I would agree that this looks like a bug.

Does compiling this routine (only) with /Qauto, without OpenMP, with optimizations run properly?
If, yes, then this could be a work around ** provided you have no intentions of making the k,j,i loop parallel in the future.

Placing the print inside the loop will inhibit vectorization of that loop.

Try removing the print and compiling that routine only with /Qsimd-, with OpenMP, with Optimizations.
Note the "-" on /Qsimd-

This suggestion is a probe to help determine the problem. If it works, it can also be used as a work around while Intel investigates this further (assuming you can send them a reproducer).

Your kji loop can be (could be) vectorized. At some point you would like to have it vectorized. If this suggestion works, I also suggest you add a reoccurring reminder to your task list to remind you to revisit this on compiler updates.

Another thought comes to mind. I cannot attest that this is true or not, Tim or Steve would be a better source.

Should you want the kji loop parallelized, recall that the compiler optimizations can collapse loops. I do not know if COLLAPSE is auto-determinable by the compiler. I know COLLAPSE is an !$OMP directive, but this does not mean loop collapsing is not used in non-OpenMP sections of code. Compiling with OpenMP enabled is likely to change the rules used by the compiler to decide if/how to collapse nested loops. What I suggest you do is:

!$OMP COLLAPSE(2)
do k = Me%WorkSize%KLB, Me%WorkSize%KUB
do j = Me%WorkSize%JLB, Me%WorkSize%JUB
do i = Me%WorkSize%ILB, Me%WorkSize%IUB

Tim or Steve, is there a !DEC$ NOCOLLAPSE and/or !DEC$ COLLAPSE(n) directive for non-OpenMP do loops?

Jim Dempsey

 

www.quickthreadprogramming.com

You can turn on optimization reports to see what the compiler did with loops. If you think there is a bug, you'd need to provide a complete test case to us (either here or through Intel Premier Support.)

Offhand I don't think we have any control over loop collapsing other than the OMP directive.

Steve - Intel Developer Support

Hello Again, Jim

"Does compiling this routine (only) with /Qauto, without OpenMP, with optimizations run properly?"

Yes, it does. Compiling only the module with /O3+/Qauto without OpenMP works.

"Try removing the print and compiling that routine only with /Qsimd-, with OpenMP, with Optimizations."

With this configuration (only for the module where the routine is), do not works.
But this is not surprise. The looping is not being vectorized anyway, probably due the fact of being in the form vec(i) = vec(i) + k. At least is what I understood from the message of why this looping is not being vectorized (when compiling without /Qsimd-)

But searching a little deep, seems that the culprit is the Me%ExternalVar%Concentration3D matrix.
No changes happen to it, during the execution of the suspect routine (the one that if I put a print, it solves the problem), The values in matrix, that is only used, not changed here, is always the same before and after the subroutine.

BUT, when the error happens, it is this variable that changes it value, and this leads to utlimately the error on the suspected routine.

Here is why I think it's probably a compiler problem. This matrix is used only for reading in this routine, the routine do not changes it's value, but even being so, without the "print", the optimization do something that causes a problem somewhere else in the code (I have to find where yet)

John,

The program is not structured on either case.
The subroutine is not called inside aby OpenMP looping, and the loopings inside the subroutine are not OpenMP loopings.

Cheers,
Eduardo

Steve,

I'm looking to the results file and I think it will take some time until I can sort out what is being done.
I think I should make the comparison of the report with and without the print instruction, to see what is changing.

I'll dig a little deeper in this before sending a case test.
I want to try/test everything before assuming that is effectively a compiler problem.

Thanks
Eduardo

Try this:

Change "print *,a" to "if (doDebug) print *,a"

Make doDebug a global logical variable initialize to .TRUE.

Run the program and verify it does not crash.
Re-run the program (do not recompile) with break at start, change doDebug to .FALSE., continue.
Does the program complete or crash?

This test will run the exact same code with different code path.

This is an indirect test to see if the call to print, which will modify memory locations deeper on the stack, which is memory supposedly not used by the routine Compute_Multiparameters, but possibly used by code called by Compute_Multiparameters. Note, if a subroutine/function called by Compute_Multiparameters has variables that are supposed to be persistent (or unintentionally require persistence) then the call to the print may accidentally provide the appearance of it fixes the problem. Using the doDebug to enable/disable the printing will test this hypothesis.

>>BUT, when the error happens, it is this variable that changes it value, and this leads to utlimately the error on the suspected routine.

Curious.

Does the LOC(Me%ExternalVar%Concentration3D) and/or SIZE(Me%ExternalVar%Concentration3D) change as well (IOW your pointer got corrupted)?

Jim Dempsey

www.quickthreadprogramming.com

Hi Jim,

I did the test of making the print based on "doDebug", on run letting the default value of .true. and one where I changed the value at the construction for .false. 

Both simulations run to the end, with exact the same results. Removing this instruction from the routine (comment it) makes the error appear.
In the case it works (with the print) the results are identical to a run without OpenMP and one without Optimizations without the print.

When the program crashes, LOC(Me%ExternalVar%Concentration3D) and SIZE(Me%ExternalVar%Concentration3D) both remain the same from beginning to end. No changes at all.

In fact, running a simulation that runs ok (with print or without optimizations or without OpenMP) and comparing the results with one that fails, I noticed a thing.

The error rises in a routine called ModifySolarRadiation, in this  equation: exp(-ShortWaveExtinctionField(i,j,k+1) * Thickness)

ShortWaveExtinctionField is defined in the Compute_MultiParameters routine, is a pointer to Me%ShortWaveExtinctionCoef3D array. This by its time, depends on the value of the Concentration.

Because in the Compute_MultiParameters subroutine everything seems fine, the values are being correctly calculated (I checked), I look for the Concentration3D as the only possibility for the error. In fact, the error starts there.

The Concentration3D array is calculated inside other routines, one of them named AdvectionDiffusion, that is in another module. Looking for the value before and after entering this routine, for both ok and bad runs, I noticed that for many iterations, everything goes well, with both runs having exactly the same value before and after.

Suddenly, in the bad run, the result changes after entering this routine (after many iterations).

What puzzles me is how a simple print in a routine in another module can make this strange behaviour disappear...
I first thought of an OpenMP problem, but the results are always the same for the bad simulation, and for a OpenMP problem I was expecting some kind of differences in the results, from run to run. I was expecting different results also if it was a stack corruption or something like that.

I don't know whether you've looked into it, but of course adding a print could prevent the compiler from vectorizing, possibly reducing the possibility of puzzling results from incorrect code, or could introduce sufficient delay to suppress potential race conditions.  It can equally happen that race conditions are suppressed by code optimization, in case you consider that puzzling.

Hello Tim,

I looked into the report from the optimizer and it seems that he is unable to vectorize the loopings inside the Compute_Multiparameters routine, due some kind of dependence or non-standard looping.

Here is the part of the report (that I attached) regarding the loopings:

------ Build started: Project: MOHIDBase1, Configuration: Release Double OpenMP|x64 ------

Compiling with Intel(R) Visual Fortran Compiler XE 14.0.2.176 [Intel(R) 64]...
ModuleLightExtinction.F90

HLO REPORT LOG OPENED ON Sat Mar 29 21:42:07 2014

<D:\Projects\Software\mohid.codeplex.pristine.tests\Software\MOHIDBase1\ModuleLightExtinction.F90;-1:-1;hlo;MODULELIGHTEXTINCTION_mp_COMPUTE_MULTIPARAMETERS;0>
High Level Optimizer Report (MODULELIGHTEXTINCTION_mp_COMPUTE_MULTIPARAMETERS)
QLOOPS 9/9/0	ENODE LOOPS 9 unknown 0 multi_exit_do 2 do 7 linear_do 7 lite_throttled 0
LINEAR HLO EXPRESSIONS:  160 / 637 + LINEAR(innermost): 83 / 637
------------------------------------------------------------------------------
HPO VECTORIZER REPORT (MODULELIGHTEXTINCTION_mp_COMPUTE_MULTIPARAMETERS) LOG OPENED ON Sat Mar 29 21:42:07 2014

<D:\Projects\Software\mohid.codeplex.pristine.tests\Software\MOHIDBase1\ModuleLightExtinction.F90;-1:-1;hpo_vectorization;MODULELIGHTEXTINCTION_mp_COMPUTE_MULTIPARAMETERS;0>
HPO Vectorizer Report (MODULELIGHTEXTINCTION_mp_COMPUTE_MULTIPARAMETERS)
D:\Projects\Software\mohid.codeplex.pristine.tests\Software\MOHIDBase1\ModuleLightExtinction.F90(1461:17-1461:17):VEC:MODULELIGHTEXTINCTION_mp_COMPUTE_MULTIPARAMETERS:  loop was not vectorized: nonstandard loop is not a vectorization candidate
loop was not vectorized: existence of vector dependence
D:\Projects\Software\mohid.codeplex.pristine.tests\Software\MOHIDBase1\ModuleLightExtinction.F90(1465:25-1465:25):VEC:MODULELIGHTEXTINCTION_mp_COMPUTE_MULTIPARAMETERS:  potential FLOW dependence
potential ANTI dependence
D:\Projects\Software\mohid.codeplex.pristine.tests\Software\MOHIDBase1\ModuleLightExtinction.F90(1460:17-1460:17):VEC:MODULELIGHTEXTINCTION_mp_COMPUTE_MULTIPARAMETERS:  loop was not vectorized: not inner loop
D:\Projects\Software\mohid.codeplex.pristine.tests\Software\MOHIDBase1\ModuleLightExtinction.F90(1459:17-1459:17):VEC:MODULELIGHTEXTINCTION_mp_COMPUTE_MULTIPARAMETERS:  loop was not vectorized: not inner loop
D:\Projects\Software\mohid.codeplex.pristine.tests\Software\MOHIDBase1\ModuleLightExtinction.F90(1442:17-1442:17):VEC:MODULELIGHTEXTINCTION_mp_COMPUTE_MULTIPARAMETERS:  loop was not vectorized: nonstandard loop is not a vectorization candidate
loop was not vectorized: existence of vector dependence
D:\Projects\Software\mohid.codeplex.pristine.tests\Software\MOHIDBase1\ModuleLightExtinction.F90(1446:25-1446:25):VEC:MODULELIGHTEXTINCTION_mp_COMPUTE_MULTIPARAMETERS:  potential FLOW dependence on .2.93_2.
potential ANTI dependence on .2.93_2.
D:\Projects\Software\mohid.codeplex.pristine.tests\Software\MOHIDBase1\ModuleLightExtinction.F90(1441:17-1441:17):VEC:MODULELIGHTEXTINCTION_mp_COMPUTE_MULTIPARAMETERS:  loop was not vectorized: not inner loop
D:\Projects\Software\mohid.codeplex.pristine.tests\Software\MOHIDBase1\ModuleLightExtinction.F90(1440:17-1440:17):VEC:MODULELIGHTEXTINCTION_mp_COMPUTE_MULTIPARAMETERS:  loop was not vectorized: not inner loop
D:\Projects\Software\mohid.codeplex.pristine.tests\Software\MOHIDBase1\ModuleLightExtinction.F90(1491:13-1491:13):VEC:MODULELIGHTEXTINCTION_mp_COMPUTE_MULTIPARAMETERS:  loop was not vectorized: existence of vector dependence
D:\Projects\Software\mohid.codeplex.pristine.tests\Software\MOHIDBase1\ModuleLightExtinction.F90(1493:53-1495:21):VEC:MODULELIGHTEXTINCTION_mp_COMPUTE_MULTIPARAMETERS:  potential ANTI dependence between ME and (unknown).
D:\Projects\Software\mohid.codeplex.pristine.tests\Software\MOHIDBase1\ModuleLightExtinction.F90(1495:21-1493:53):VEC:MODULELIGHTEXTINCTION_mp_COMPUTE_MULTIPARAMETERS:  potential FLOW dependence between (unknown) and ME.

Could be something related to "inlining"?
I put also the report when the print is there. It seems that the "inline" thing changes are more important than the vectorization (that do not happens either way).

Attachments: 

Hello Tim,

With or without the print statments, the loopings are not being vectorized according the optmization report.
I compiled just the module where Compute_Multiparameters is, selecting just the routine.

The post with the reports attached (both with and without the print instruction) are waiting for approval.

Maybe is something related with inlining...

I tested a change in the subroutine, using arguments to pass the matrix:

    subroutine Compute_Multiparameters (ExtinctionCoefField3D, ExtinctionCoefField1D)
    
        real, dimension(:,:,:), pointer, optional   :: ExtinctionCoefField3D
        real, dimension(:), pointer, optional       :: ExtinctionCoefField1D

        !Local-----------------------------------------------------------------
        integer                                 :: i, j, k

        !Begin-----------------------------------------------------------------        

if3D:   if (Me%Is3D) then

            if (.not. present(ExtinctionCoefField3D)) stop 'ModuleLightExtiction - Compute_Multiparameters - ERR001'

             if((Me%ExternalVar%PropertyID == MacroAlgae_) .OR. (Me%ExternalVar%PropertyID == SeagrassesLeaves_) ) then  

                do k = Me%WorkSize%KLB, Me%WorkSize%KUB
                do j = Me%WorkSize%JLB, Me%WorkSize%JUB
                do i = Me%WorkSize%ILB, Me%WorkSize%IUB

                    if (Me%ExternalVar%WaterPoints3D(i, j, k) == WaterPoint) then 

                        ExtinctionCoefField3D(i,j,k) =                           &
                                            ExtinctionCoefField3D(i,j,k)       + &
                                            Me%ExternalVar%ExtinctionParameter              * &
                                           (Me%ExternalVar%Concentration3D(i, j, k)         * &
                                            Me%ExternalVar%ProducerOccupation(i, j, k)    * Me%UnitsCoef)
                    end if

                enddo
                enddo
                enddo

            else 
                                
                do k = Me%WorkSize%KLB, Me%WorkSize%KUB
                do j = Me%WorkSize%JLB, Me%WorkSize%JUB
                do i = Me%WorkSize%ILB, Me%WorkSize%IUB

                    if (Me%ExternalVar%WaterPoints3D(i, j, k) == WaterPoint) then 

                        ExtinctionCoefField3D(i,j,k) =                                &
                                            ExtinctionCoefField3D(i,j,k)            + &
                                            Me%ExternalVar%ExtinctionParameter      * &
                                           (Me%ExternalVar%Concentration3D(i, j, k) * Me%UnitsCoef)
                                      
                    end if

                enddo
                enddo
                enddo

            end if 

        else if3D

            if (.not. present(ExtinctionCoefField1D)) stop 'ModuleLightExtiction - Compute_Multiparameters - ERR003'
            
            do i = Me%Size1D%ILB, Me%Size1D%IUB

                if (Me%ExternalVar%RiverPoints1D(i) == WaterPoint) then 
                
                    ExtinctionCoefField1D(i) =                                &
                                        ExtinctionCoefField1D(i)            + &
                                        Me%ExternalVar%ExtinctionParameter  * &
                                        (Me%ExternalVar%Concentration1D(i)  * Me%UnitsCoef)                                        
                end if

            enddo

        endif if3D

    end subroutine Compute_Multiparameters

The use of arguments instead of the "global" variables also has the same effect in results than the "print" instruction, making the error disappear. 

What the optimization can do with a subroutine without arguments that it can't with one with arguments?

Cheers,
Eduardo.

The pass through print and bypass of the print running to completion demonstrates that this is neither a code placement nor consequential stack modification by print issue. IOW this narrows the scope of potential causes of the problem.

>>When the program crashes, LOC(Me%ExternalVar%Concentration3D) and SIZE(Me%ExternalVar%Concentration3D) both remain the same from beginning to end. No changes at all.

Pointer not modified (narrowing the scope of potential causes of the problem).

>>The Concentration3D array is calculated inside other routines

(keep in mind that I am "flying blind" here without seeing your code)

Your code style is to use pointers. There is nothing inherently wrong with this, however the degree of care is increased with respect to pointers to valid objects, then later used after the object lifetime expires. The compiler options /openmp, /Qauto and use of RECURSIVE will affect what happens to stale data. This is the residual values of object memory space after object expires. Meaning after it is deallocated, or goes out of scope on return from subroutine or function.

Check your code for something like

subroutine/function foo(dummyX)
real :: dummyX(:,:,:) ! or with other dimension declarations
real :: localX(nX,nY,nZ)
Concentration3D => dummyX
or
Concentration3D => dummyX(...) ! slice of dummy
or
Concentration3D => localX
or
Concentration3D => localX(...) ! slice of local
 

Where Concentration3D is outside the scope of the subroutine/function (in object or module)

What can be happening here can have the following possibilities:

When compiled without OpenMP localX could (is likely) be equivalent to SAVE and thus the data portion remain immutable after return from procedure. With OpenMP it is on stack and may or may not be altered by other code. From prior discussion on this thread, this would have a low but non-zero probability of being the cause.

The dummy argument case is more interesting.

Depending on point of call, and what is known (or not known) of the actual argument, and known (or not known) of the interface to the subroutine/function, a temporary array may be produced for the actual argument for use by the subroutine/function that initializes the pointer to the Concentration3D. If this is the case, then you now have a pointer (Concentration3D) pointing to an object that has expired (the temporary array created for the call), and subsequently those memory locations are subject to unexpected changes.

Jim Dempsey
 

www.quickthreadprogramming.com

Additional hint.

To quickly testy the hypothesis of the dummy argument pointing to a temporary array:

After the call that sets up the Concentration3D array pointer you can insert a test to verify the pointer is valid.

For example, if the pointer is supposed to point to a slice of a larger array you can test to see if the resultant Concentration3D array (entire array) falls within the known possible positions.

Jim Dempsey

www.quickthreadprogramming.com

Hello,

After 3 days digging for what was happening, I found what seems to be the probable cause of the problem.

The model was running with a parameter that caused a certain algorithm to become too much sensitive to very small changes in the Concentration3D values. This algorithm used concentration to calculate coefficients, used later to calculate new concentrations. Because very small variations in the concentration values caused big changes in the coefficients, at some point the contrary impact of other calculations lost its effectiveness and this create a vicious circle with a big instability that lead, in the end, to the floating overflow.

I still suspects that we have some kind of problem in the code, due to the fact that a simple print in a routine (a routine that did not changed Concentration3D values) could stop this behavior.  So, I'll continue to investigate.

In fact, I was able to find two small bugs (not related to my problem) during this investigation.
I would like to thank everybody that make the effort to help.

If I find the problem and it is something in our code, I came back here to tell.

If is something with the compiler (less probable), I'll create a schematic case test and send to the support (with knowledge here).

Thanks,

Eduardo

 

If you have numerical issues, including those affected by adding prints, you might consider including /assume:protect_parens /Qprec-div /Qprec-sqrt or even /fp:source if this doesn't slow it down.  As far as I'm concerned, protect_parens should be used always.  The option /Qftz is a default on account of the behavior of CPUs prior to corei7-3; there should be no need to use it for performance on the last 2 CPU generations.

Such sensitivities may also indicate a need for algorithmic investigation, as you suggested, or promotion of critical single precision calculations to double.

If you have source code bugs, sometimes they will be triggered in the absence of prints by optimizations such as /Qip.  I'm not advocating removal of that optimization except as a step toward diagnosing a possible bug.

Leave a Comment

Please sign in to add a comment. Not a member? Join today