Question

Question

I compiled my code on my intel fortran compiler 11.1.048.
The code just ran fine.

When I compiled and ran the code in a UNIX cluster, with the Intel Fortran 11.1 I have something really weird going on.

First, the program was crashing at some point... Debugging it I found this out:

XR(14) = log(rsk(1))**2
print*, 'XR(14) - log(rsk(1))**2 =', XR(14) - log(rsk(1))**2
XR(15) = log(rsk(2))**2
print*, 'XR(15) - log(rsk(2))**2 =', XR(15) - log(rsk(2))**2
XR(16) = log(rsk(3))**2
print*, 'XR(16) - log(rsk(3))**2 =', XR(16) - log(rsk(3))**2
XR(17) = log(rsk(4))**2
print*, 'XR(17) - log(rsk(4))**2 =', XR(17) - log(rsk(4))**2

this prints a bunch of zeros on the screen, which should be the case.

However if I do this:

XR(14) = log(rsk(1))**2
XR(15) = log(rsk(2))**2
XR(16) = log(rsk(3))**2
XR(17) = log(rsk(4))**2

print*, 'XR(14) - log(rsk(1))**2 =', XR(14) - log(rsk(1))**2
print*, 'XR(15) - log(rsk(2))**2 =', XR(15) - log(rsk(2))**2
print*, 'XR(16) - log(rsk(3))**2 =', XR(16) - log(rsk(3))**2
print*, 'XR(17) - log(rsk(4))**2 =', XR(17) - log(rsk(4))**2

I get nonzero stuff printed.

More confusingly, if I code:

XR(14) = log(rsk(1))**2
print*, 'XR(14) - log(rsk(1))**2 =', XR(14) - log(rsk(1))**2
XR(15) = log(rsk(2))**2
print*, 'XR(15) - log(rsk(2))**2 =', XR(15) - log(rsk(2))**2
XR(16) = log(rsk(3))**2
print*, 'XR(16) - log(rsk(3))**2 =', XR(16) - log(rsk(3))**2
XR(17) = log(rsk(4))**2
print*, 'XR(17) - log(rsk(4))**2 =', XR(17) - log(rsk(4))**2

print*, 'XR(14) - log(rsk(1))**2 =', XR(14) - log(rsk(1))**2
print*, 'XR(15) - log(rsk(2))**2 =', XR(15) - log(rsk(2))**2
print*, 'XR(16) - log(rsk(3))**2 =', XR(16) - log(rsk(3))**2
print*, 'XR(17) - log(rsk(4))**2 =', XR(17) - log(rsk(4))**2

Then I get zeros everywhere.

How is this happening?

Let me just remind that none of this happens on my machine using the Intel compiler and Intel Fortran 11.1.048. But that happens when I migrate to the UNIX cluster with Intel Fortran 11.1

Thanks,
Rafael

52 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

What do you mean by "non-zero"? Are these large values, or just round-off? Some formats may store the intermediate result, giving zero. Are the compiler defaults the same for both environments? Theuse of stack vs variable storage may be different.

David

Quoting - David White
What do you mean by "non-zero"? Are these large values, or just round-off? Some formats may store the intermediate result, giving zero. Are the compiler defaults the same for both environments? Theuse of stack vs variable storage may be different.

David

Thank you for the reply, David.

These are not round-off values, these are big values. The right answer should be zero.
The compiler defaults are not exactly the same, but that should not be the issue since I am using the heap-arrays option in both.

Here is another test:

This produces right results if I print XR(14:17)

a = log(rsk(1))
XR(14) = a**2
a = log(rsk(2))
XR(15) = a**2
a = log(rsk(3))
XR(16) = a**2
a = log(rsk(4))
XR(17) = a**2

This produces wrong results:

a = log(rsk(1))**2
XR(14) = a
a = log(rsk(2))**2
XR(15) = a
a = log(rsk(3)**2
XR(16) = a
a = log(rsk(4))**2
XR(17) = a

XR(14) stores the right number, but XR(15) to XR(17) store wrong numbers.

Just for background, my original code had:

XR(14) =log(rsk(1))**2
XR(15) =log(rsk(2))**2
XR(16) =log(rsk(3))**2
XR(17) = log(rsk(4))**2

and these assignments were producing wrong results.

I tried so many things... Really don't know what's going on.

Rafael,

as has been repeated many times on the forum in recent weeks when strange results occur, have you checked array bounds, etc - is there a possibility of trampling over memory giving these results?

David

Quoting - rafadix08

...
XR(14) =log(rsk(1))**2
XR(15) =log(rsk(2))**2
XR(16) =log(rsk(3))**2
XR(17) = log(rsk(4))**2

and these assignments were producing wrong results.

try cleaning your code, and optimize (appropriate declarations inferred); exponentiating to the power 2 is never as efficient or accurate or fast as simple multiplication:

DO j = 1, 4
lrsk = LOG(rsk(j))
xr(13+j) = lrsk * lrsk
END DO

Any satisfactory compiler will perform full optimization of **2 (**2. would be debatable). The C analogue is debatable as well, but we're talking about Fortran consensus going back at least 3 decades. Even the f2c translator can deal with it.

Yes, I know that arrays out of bounds can produce strange results. ButI did check the bounds of my arrays... Too many times!I also compiled with the Qdiag-enable option to see if the compiler detected something, but no.

I did try doing log*log, but the same problem persists.

At some other portion of my code I have assignments of exactly the same type and these work fine.

Quoting - rafadix08
Yes, I know that arrays out of bounds can produce strange results. ButI did check the bounds of my arrays... Too many times!I also compiled with the Qdiag-enable option to see if the compiler detected something, but no.

I did try doing log*log, but the same problem persists.

At some other portion of my code I have assignments of exactly the same type and these work fine.

Let me just add that my code worked in my Windows machine.
I am having trouble executing it in a UNIX cluster.
If the options are not the same, they are very similar...

Here is my compiling line on the UNIX machine:
ifort trsapp.f bigden.f newuoa.f update.f biglag.f newuob.f Global_Data.f90 minim.f90 LinReg_MOD.f90 Parallel_Emax_MOD.f90 Loss_Function_MOD.f90 calfun.f Main.f90 -o estimation -L$LIBRARY_PATH -I$INCLUDE -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -Wl,--end-group -liomp5 -openmp -lpthread -heap-arrays 0

Here is my Build using Microsoft Visual Studio:
Deleting intermediate files and output files for project 'Estimation_4sectors', configuration 'Release|Win32'.
Compiling with Intel Visual Fortran 11.1.048 [IA-32]...
ifort /nologo /heap-arrays0 /Qopenmp /module:"Release" /object:"Release" /libs:static /threads /c /extfor:f /Qvc9 /Qlocation,link,"c:Program FilesMicrosoft Visual Studio 9.0VCbin" "C:Documents and SettingsRafael Dix CarneiroMy DocumentsThesisFortran Codes4Sectors_Educ4newuoa.f"
ifort /nologo /heap-arrays0 /Qopenmp /module:"Release" /object:"Release" /libs:static /threads /c /extfor:f /Qvc9 /Qlocation,link,"c:Program FilesMicrosoft Visual Studio 9.0VCbin" "C:Documents and SettingsRafael Dix CarneiroMy DocumentsThesisFortran Codes4Sectors_Educ4bigden.f"
ifort /nologo /heap-arrays0 /Qopenmp /module:"Release" /object:"Release" /libs:static /threads /c /extfor:f /Qvc9 /Qlocation,link,"c:Program FilesMicrosoft Visual Studio 9.0VCbin" "C:Documents and SettingsRafael Dix CarneiroMy DocumentsThesisFortran Codes4Sectors_Educ4trsapp.f"
ifort /nologo /heap-arrays0 /Qopenmp /module:"Release" /object:"Release" /libs:static /threads /c /extfor:f /Qvc9 /Qlocation,link,"c:Program FilesMicrosoft Visual Studio 9.0VCbin" "C:Documents and SettingsRafael Dix CarneiroMy DocumentsThesisFortran Codes4Sectors_Educ4biglag.f"
ifort /nologo /heap-arrays0 /Qopenmp /module:"Release" /object:"Release" /libs:static /threads /c /extfor:f /Qvc9 /Qlocation,link,"c:Program FilesMicrosoft Visual Studio 9.0VCbin" "C:Documents and SettingsRafael Dix CarneiroMy DocumentsThesisFortran Codes4Sectors_Educ4newuob.f"
ifort /nologo /heap-arrays0 /Qopenmp /module:"Release" /object:"Release" /libs:static /threads /c /extfor:f /Qvc9 /Qlocation,link,"c:Program FilesMicrosoft Visual Studio 9.0VCbin" "C:Documents and SettingsRafael Dix CarneiroMy DocumentsThesisFortran Codes4Sectors_Educ4update.f"
ifort /nologo /heap-arrays0 /Qopenmp /module:"Release" /object:"Release" /libs:static /threads /c /Qvc9 /Qlocation,link,"c:Program FilesMicrosoft Visual Studio 9.0VCbin" "C:Documents and SettingsRafael Dix CarneiroMy DocumentsThesisFortran Codes4Sectors_Educ4Global_Data.f90"
ifort /nologo /heap-arrays0 /Qopenmp /module:"Release" /object:"Release" /libs:static /threads /c /Qvc9 /Qlocation,link,"c:Program FilesMicrosoft Visual Studio 9.0VCbin" "C:Documents and SettingsRafael Dix CarneiroMy DocumentsThesisFortran Codes4Sectors_Educ4LinReg_MOD.f90"
ifort /nologo /heap-arrays0 /Qopenmp /module:"Release" /object:"Release" /libs:static /threads /c /Qvc9 /Qlocation,link,"c:Program FilesMicrosoft Visual Studio 9.0VCbin" "C:Documents and SettingsRafael Dix CarneiroMy DocumentsThesisFortran Codes4Sectors_Educ4minim.f90"
ifort /nologo /heap-arrays0 /Qopenmp /module:"Release" /object:"Release" /libs:static /threads /c /Qvc9 /Qlocation,link,"c:Program FilesMicrosoft Visual Studio 9.0VCbin" "C:Documents and SettingsRafael Dix CarneiroMy DocumentsThesisFortran Codes4Sectors_Educ4Parallel_Emax_MOD.f90"
ifort /nologo /heap-arrays0 /Qopenmp /module:"Release" /object:"Release" /libs:static /threads /c /Qvc9 /Qlocation,link,"c:Program FilesMicrosoft Visual Studio 9.0VCbin" "C:Documents and SettingsRafael Dix CarneiroMy DocumentsThesisFortran Codes4Sectors_Educ4Loss_Function_MOD.f90"
ifort /nologo /heap-arrays0 /Qopenmp /module:"Release" /object:"Release" /libs:static /threads /c /Qvc9 /Qlocation,link,"c:Program FilesMicrosoft Visual Studio 9.0VCbin" "C:Documents and SettingsRafael Dix CarneiroMy DocumentsThesisFortran Codes4Sectors_Educ4Main.f90"
ifort /nologo /heap-arrays0 /Qopenmp /module:"Release" /object:"Release" /libs:static /threads /c /extfor:f /Qvc9 /Qlocation,link,"c:Program FilesMicrosoft Visual Studio 9.0VCbin" "C:Documents and SettingsRafael Dix CarneiroMy DocumentsThesisFortran Codes4Sectors_Educ4calfun.f"
Linking...
Link /OUT:"ReleaseEstimation_4sectors.exe" /INCREMENTAL:NO /NOLOGO /MANIFEST /MANIFESTFILE:"C:Documents and SettingsRafael Dix CarneiroMy DocumentsVisual Studio 2008ProjectsEstimation_4sectorsEstimation_4sectorsReleaseEstimation_4sectors.exe.intermediate.manifest" /SUBSYSTEM:CONSOLE /STACK:100000000 /IMPLIB:"C:Documents and SettingsRafael Dix CarneiroMy DocumentsVisual Studio 2008ProjectsEstimation_4sectorsEstimation_4sectorsReleaseEstimation_4sectors.lib" "Releasenewuoa.obj" "Releasebigden.obj" "Releasetrsapp.obj" "Releasebiglag.obj" "Releasenewuob.obj" "Releaseupdate.obj" "ReleaseGlobal_Data.obj" "ReleaseLinReg_MOD.obj" "Releaseminim.obj" "ReleaseParallel_Emax_MOD.obj" "ReleaseLoss_Function_MOD.obj" "ReleaseMain.obj" "Releasecalfun.obj"
Link: executing 'link'

Embedding manifest...
mt.exe /nologo /outputresource:"C:Documents and SettingsRafael Dix CarneiroMy DocumentsVisual Studio 2008ProjectsEstimation_4sectorsEstimation_4sectorsReleaseEstimation_4sectors.exe;#1" /manifest "C:Documents and SettingsRafael Dix CarneiroMy DocumentsVisual Studio 2008ProjectsEstimation_4sectorsEstimation_4sectorsReleaseEstimation_4sectors.exe.intermediate.manifest"

Estimation_4sectors - 0 error(s), 0 warning(s)

Just a silly check: Are the rsk and XR defined to be of the same precision? Although, the LOG function overloads to correct value based on the kind of the argument, you may want to test the results by using Real(4). Also, may be try DLOG as well.

Abhi

Quoting - abhimodak
Just a silly check: Are the rsk and XR defined to be of the same precision? Although, the LOG function overloads to correct value based on the kind of the argument, you may want to test the results by using Real(4). Also, may be try DLOG as well.

Abhi

Yes, XR and rsk are both double precision.
I also tried dlog, but same problem...

I would greatly apprecite if I had a reply from the Intel team.

I have tried to clean up the code as much as I can in order to isolate the problem but it's still there.

Here is a short description of the problem:

I call a function that has the following assignments in its body:
XR(14) = log(rsk(1))**2
XR(15) = log(rsk(2))**2
XR(16) = log(rsk(3))**2
XR(17) = log(rsk(4))**2

It turns out that XR(14:17) are not being recorded in the right way. I have the following lines that tell me that:

print*, 'log(rsk(1))**2 =', log(rsk(1))**2, 'XR(14) =', XR(14)
print*, 'log(rsk(2))**2 =', log(rsk(2))**2, 'XR(15) =', XR(15)
print*, 'log(rsk(3))**2 =', log(rsk(3))**2, 'XR(16) =', XR(16)
print*, 'log(rsk(4))**2 =', log(rsk(4))**2, 'XR(17) =',XR(17)

This should produce two columns with exactly the same numbers.

Instead, here is what I get:

log(rsk(1))**2 = 3.728137320489804E-003 XR(14) = 3.728137320489804E-003
log(rsk(2))**2 = 0.596035301219827 XR(15) = 1.289162843445539E-002
log(rsk(3))**2 = 1.289162843445539E-002 XR(16) = 0.617984690539646
log(rsk(4))**2 = 7.44257262752527 XR(17) = 0.862984463126197

XR(14) is right, but XR(15) is recording log(rsk(3))**2 instead of log(rsk(2))**2 and X(16:17) are recording something I don't know what it is.

Here is a short history of what I have done in order to solve the problem:

I compiled this code on my own Windows machine and the code is working perfectly. The above problem does not show up in my Windows machine.

The above problem shows up only when I compiled the exact same code in a UNIX cluster (Intel 11.1).

I am aware that arrays out of bounds are the first suspects for this type of problem and have thorouly check for that. I have compiled the code with -diag-enable sc and with -check all
The only message I receive is:
forrtl: warning (402): fort: (1): In call to EMAX_HAT, an array temporary was created for argument #1, but from what I know this warning is inoffensive.

I took away many parts of the code in order to focus on only the portion of code that is causing the problem.

Please let me know what type of settings I could try in order to find out what is going on.

Many thanks,
Rafael

If you would provide a small (if possible) but complete program that demonstrates the problem, I'd be glad to take a look. I don't think speculating based on code excerpts would be worthwhile.

Also, I am a bit confused when you say "UNIX machine", as Intel Fortran doesn't support any "UNIX" systems. We do support Linux, which is of course related to UNIX, but usually people don't call Linux "UNIX". Which Intel compiler version are you using on this UNIX system?

I would suggest "-warn interface" as a useful addition to your compiles. The symptom is that of argument type mismatches.

Retired 12/31/2016

Quoting - Steve Lionel (Intel)
If you would provide a small (if possible) but complete program that demonstrates the problem, I'd be glad to take a look. I don't think speculating based on code excerpts would be worthwhile.

Also, I am a bit confused when you say "UNIX machine", as Intel Fortran doesn't support any "UNIX" systems. We do support Linux, which is of course related to UNIX, but usually people don't call Linux "UNIX". Which Intel compiler version are you using on this UNIX system?

I would suggest "-warn interface" as a useful addition to your compiles. The symptom is that of argument type mismatches.

Hi Steve,

Thanks for offering, I am attaching my program... It is going to print a bunch of stuff, but what I need is that the two columns produce the same results that is (log(rsk(1:4)))**2 and XR(14:17) should be the same.

Here is the operating system / Intel Fortran details: PU_IAS Linux 5 and I have Intel 11.1 on that machine.

I am compiling with the following commands:

ifort Global_Data.f90 LinReg_MOD.f90 Parallel_Emax_MOD.f90 Main.f90 -o estimation -L$LIBRARY_PATH -I$INCLUDE -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -Wl,--end-group -liomp5 -openmp -lpthread -heap-arrays

Many thanks for your help.

Rafael

I first tried this on Windows. I had to change the reference to mkl_lapack.f90 as that file is not provided by MKL that comes with Intel Fortran 11.1. Are you using a different MKL?

The program prints out those four values many, many, many times. Earlier you wrote that they should be zero, but they're not when I run it on Windows. What should I be looking for?

Retired 12/31/2016

Quoting - Steve Lionel (Intel)

I first tried this on Windows. I had to change the reference to mkl_lapack.f90 as that file is not provided by MKL that comes with Intel Fortran 11.1. Are you using a different MKL?

The program prints out those four values many, many, many times. Earlier you wrote that they should be zero, but they're not when I run it on Windows. What should I be looking for?

Hi Steve,

Thanks for trying that out.

When I run it on Windows, with my compiler 11.1 everything works fine, so that's the puzzle.

In function Emax_hat in module Emax_MOD (Parallel_Emax+MOF.f90) I have several assigments:
XR(14) = log(rsk(1))**2
XR(14) = log(rsk(1))**2
XR(14) = log(rsk(1))**2
XR(14) = log(rsk(1))**2

However, these assignments are not working properly.

Something like this is printed repeatedly on the screen:

log(rsk(1))**2 = 3.728137320489804E-003 XR(14) = 3.728137320489804E-003
log(rsk(2))**2 = 0.596035301219827 XR(15) = 1.289162843445539E-002
log(rsk(3))**2 = 1.289162843445539E-002 XR(16) = 0.617984690539646
log(rsk(4))**2 = 7.44257262752527 XR(17) = 0.862984463126197

And the first column should be equal to the second one.

Running the exact same code as the one you have in Linux with Intel 11.1 I have different results (see above)...

But I want to emphasize that compiling and running on Windows nothing of this happens.

Many thanks again,
Rafael

Complemeting my previous reply:

When I compile on Windows I use:
include 'lapack.f90'
instead of include 'mkl_lapack.f90'

In the Linux cluster, in order to use MKL I load a module: intel-mkl/10.1/015/64

Hope that helps,
Rafael

Ooops... Sorry...
Copied and paste and forgot to edit...

The assignments that are not working are:
XR(14) = log(rsk(1))**2
XR(15) = log(rsk(2))**2
XR(16) = log(rsk(3))**2
XR(17) = log(rsk(4))**2

And not

XR(14) = log(rsk(1))**2
XR(14) = log(rsk(1))**2
XR(14) = log(rsk(1))**2
XR(14) = log(rsk(1))**2

Rafael,

log(rsk(1))**2 = 3.728137320489804E-003 XR(14) = 3.728137320489804E-003
log(rsk(2))**2 = 0.596035301219827 XR(15) = 1.289162843445539E-002
log(rsk(3))**2 = 1.289162843445539E-002 XR(16) = 0.617984690539646
log(rsk(4))**2 = 7.44257262752527 XR(17) = 0.862984463126197

FWIW your XR(15) appears to contain the log(rsk(3))**2 value, not the log(rsk(2))**2 value.
This could potentialy be:

1) an SSE versioning error between what your processor supports and what your code requires
2) Potentially a cache coherency issue related to parallel programming (you have not stated as to if this code is executing in a parallel region (with other threads potentially writing to the same/nearby XR(i) location))

Jim Dempsey

Quoting - jimdempseyatthecove

Rafael,

log(rsk(1))**2 = 3.728137320489804E-003 XR(14) = 3.728137320489804E-003
log(rsk(2))**2 = 0.596035301219827 XR(15) = 1.289162843445539E-002
log(rsk(3))**2 = 1.289162843445539E-002 XR(16) = 0.617984690539646
log(rsk(4))**2 = 7.44257262752527 XR(17) = 0.862984463126197

FWIW your XR(15) appears to contain the log(rsk(3))**2 value, not the log(rsk(2))**2 value.
This could potentialy be:

1) an SSE versioning error between what your processor supports and what your code requires
2) Potentially a cache coherency issue related to parallel programming (you have not stated as to if this code is executing in a parallel region (with other threads potentially writing to the same/nearby XR(i) location))

Jim Dempsey

Thank you for your message, Jim.

Yes, I did notice the switch you are mentioning.

About your points:

1) Could you please be more specific? How can I check that? I have successfully compiled and run another version of this code (minor modifications) on exactly the same Linux system.

2) The problem persists if I compile the code serially. But this shouldn't make a difference since XR is a local variablein a function that is not parallelized.

XR is actually a vector of size 81 and this problem of wrong assigments occur only for the entries XR(14:17).

I am really puzzled and don't know what to do. Just a reminder that this exact same code was successfully compiled and ran well in my Windows machine. So I can only think that there is a compiler option that I should set, there is a compiler bug, or some oder incompatibility.

>>2) The problem persists if I compile the code serially. But this shouldn't make a difference since XR is a local variablein a function that is not parallelized.

Although this function may not be parallized, it may be called from a routine that is parallized. If so, it needs to be thread-safe.

Mark your subroutine as RECURSIVE or use the INTEL specific ", AUTOMATIC" when declaring the array XR.

subroutine foo
real(8) :: XR(81) ! this is a SAVE array (or more precisely NON-guaranteed local array)

recursive
subroutine foo
real(8) :: XR(81) ! this is a local array

subroutine foo
real(8), automatic:: XR(81) ! this is a local array (but automatic is Intel specific)

Try addressing 2) first (above addresses 2)

for 1), pick an oldercomputer architecture such as Pentium 4, then migrate to newer archetectures.

Jim Dempsey

Jim,
Many thanks for your responses, I really appreciate it.

I tried a couple of the things you suggested, but none worked.

First, the automatic array declaration and then the recursive array declaration. The problem persists.

In the end I deleted all the parallelization code I had and made it a purely serial code. I also compiled it without the openmp option. Tried the automatic and recursive (one at a time) declarations but they didn't work once again.

I have not tried your solution to 1), but when I run the code in my machine with Intel Core 2 Duo Processors it works.
When I run it on a Linux system equipped with 8 Intel Xeon CPU E5345 processors it doesn't work and the point where I see the code messing up is exactly at this XR array assingnment.

Ok, I was able to make it work now, but I would need some help to figure out what is going on.

Here is my call tofunction Emax_hat (the one where the problem was):

Emax(s) = Emax_hat(PI_COEF, rsk(:,kk), ExperTomorrow, s)

Here is how I declared the variables in Emax_hat:

============================================================

function Emax_hat(PI, rsk, exper, lag)

USE Global_Data

implicit none

integer , intent(in) :: exper(NSECTORS), lag
real(KIND=DOUBLE), intent(in) :: PI(NREG), rsk(NSECTORS)

integer i, s

real(KIND=DOUBLE) XR(NREG), log_Emax, Emax_hat

integer LagDummy(NSECTORS)

============================================================

NSECTORS andNREG (the sizes of many of the arrays above) are all global constants declared in module Global_Data.

I checked and in call:
Emax(s) = Emax_hat(PI_COEF, rsk(:,kk), ExperTomorrow, s)

I have the following declarations:
real(KIND=DOUBLE) PI_COEF(NREG)
real(KIND=DSOUBLE) rsk(NSECTORS,INTP)
integer ExperTomorrow(NSECTORS), lag

That is, all the types and sizes match. However, I was having the problem that I described extensively here.

I decided to try to declare thearguments of Emax_hat as assumed shape as follows:

============================================================

function Emax_hat(PI, rsk, exper, lag)

USE Global_Data

implicit none

integer , intent(in) :: exper(:), lag
real(KIND=DOUBLE), intent(in) :: PI(:), rsk(:)

integer i, s

real(KIND=DOUBLE) XR(NREG), log_Emax, Emax_hat

integer LagDummy(NSECTORS)

============================================================

And that worked.

Why is that?

So apparently the problem was indeed related to size declaration of arrays.

My guess is when you configure to compilewith errors, that one or moreof the callers assumes (or is told) the call interface passes descriptors as opposed to first cell in the array.

Try the options for geninterfaces and warn interfaces. This may point to the errant caller(s).

Jim Dempsey

Quoting - jimdempseyatthecove

My guess is when you configure to compilewith errors, that one or moreof the callers assumes (or is told) the call interface passes descriptors as opposed to first cell in the array.

Try the options for geninterfaces and warn interfaces. This may point to the errant caller(s).

Jim Dempsey

I have tried gen-interfaces and warn interfaces options. No error was pointed.

Steve, did you have the chance toexecute my code? Am I doing something wrong? If yes, why didn't the compiler - with all these options - detect the problem? Why have my code run smoothly on my Windows machine and not on the Linux system?

I feel very insecure going on without the answers to these questions.

I ran your code but did not have a chance to investigate in detail. I won't be able to get back to it for a few days.

Retired 12/31/2016

Quoting - Steve Lionel (Intel)
I ran your code but did not have a chance to investigate in detail. I won't be able to get back to it for a few days.

What about my comments above?Could you please take a look at them and let me know what you think?

Does it make sense that it worked with assumed-shape arrays?

Why didn't Jim's suggestions detect any error? (the options -warn intefaces and -gen-interfaces)

It doesn't make a lot of sense to me just reading what's here. Using deferred-shape arrays for the arguments would require that the caller see an explicit interface specifying that. I will try to look closer but it will be later this week.

Retired 12/31/2016

The function that is called (Emax_hat) is in the same module as the caller. So, I guess Iwould not need and explicit interface. Am I wrong?

I have a general question that might solve all of this, without the need of going through my code.

Suppose I have a modulecontaining a subroutine and a function.

Here is what I am doing (in general terms):

module my_module

contains

subroutine my_subroutine(vec)

use global_var

real(8) vec(N)

call my_function

end subroutine

function my_function(vec)

use global_var

real(8) my_function
real(8), intent(in) ::vec(N)

... function commands ...

end function

end my_module

Note here that I am using another module, called global_var where constant N is defined.

module global_var

save

integer, parameter :: N = 25

end module global_var

My questions:
1) Is what I described here correct? If yes, then I would need you to look at the code, because I am having assignment errors, as I described. If it's not correct, why didn't the compiler detect any error?

2) What I have done is instead of having "real(8), intent(in) :: vec(N)" in my_function(vec) I have "real(8), intent(in) :: vec(:)", that is vec is declared as an assumed-shape array. That worked.

3) What I have also done is to pass the dimension of vec as an argument of my_function:

function my_function(vec,dim)

real(8) my_function
real(8), intent(in) :: vec(dim)

... function commands ...

end function

That has also worked.

If they are in the same module, then yes, that creates an explicit interface. My guess is that N is not the actual dimension of the array when passed in.

Retired 12/31/2016

Quoting - Steve Lionel (Intel)

If they are in the same module, then yes, that creates an explicit interface. My guess is that N is not the actual dimension of the array when passed in.

I printed the dimension of the actual argument before calling the function and the dimension of the dummy argument, inside the function, and they match.

A new weird thing I discovered: if I compile with -debug full it works.

So it would be really helpful if you could go over the code and let me know what is going on. I understand your time constraints.

Please run the code using Linux.

Thank you,
Rafael

Quoting - rafadix08

Please run the code using Linux.

Note that you are posting in the forum for the windows flavour of the compiler.

Bit of speculation here - In the OpenMP 3.0 spec have a read of the Fortran specific bits in section 2.9.3.2 (data environment - shared clause), particularly the bit that starts "Under certain conditions...". There's also an elaborating example in appendix A29. Perhaps this is applicable to your code.

If so, you may have a race condition associated with the temporary copy of a variable that need to be made to mach the array section of a actual argument with an assumed size dummy. The fact that you get "array temporary" warnings is a pointer to this. Making the dummy assumed shape would avoid the need for the copy and hence avoid the race condition - which appears to be what you have found.

I'm not clear about how this potential race condition would result in the specific problem that you see, but obviously it would only apply if you had parallel execution. You claimed earlier that when you ran the code serially you still saw the problem. Are you really sure about that?

I had problems with an older version of the compiler (11.0?) when array dimensions were specified by parameters from a module and OpenMP was in use, but the symptoms were different and, if I recall correctly, it only applied to debug builds.

Quoting - IanH

Note that you are posting in the forum for the windows flavour of the compiler.

Bit of speculation here - In the OpenMP 3.0 spec have a read of the Fortran specific bits in section 2.9.3.2 (data environment - shared clause), particularly the bit that starts "Under certain conditions...". There's also an elaborating example in appendix A29. Perhaps this is applicable to your code.

If so, you may have a race condition associated with the temporary copy of a variable that need to be made to mach the array section of a actual argument with an assumed size dummy. The fact that you get "array temporary" warnings is a pointer to this. Making the dummy assumed shape would avoid the need for the copy and hence avoid the race condition - which appears to be what you have found.

I'm not clear about how this potential race condition would result in the specific problem that you see, but obviously it would only apply if you had parallel execution. You claimed earlier that when you ran the code serially you still saw the problem. Are you really sure about that?

I had problems with an older version of the compiler (11.0?) when array dimensions were specified by parameters from a module and OpenMP was in use, but the symptoms were different and, if I recall correctly, it only applied to debug builds.

Hi Ian,
Thanks for your post.

I removed all the parallelization from the code in order to isolate the problem.So, the problem I described is not due to parallelization issues.

I read somewhere on this forum that"array temporary" warnings could be caused by non-contiguous arrays sections. Once I pass a contiguous array to the function I am having the problem, the warning disappears, but the problem persists.

Yes, I noticed I am in the Windows section... Since I am hopeless now, I am considering posting at the Linux section too.

I could move the whole thread to the Linux section, but at this point I don't see that it's worthwhile. I doubt that it's actually "Linux" that makes a difference, but something different about the environment in which the program is run.

Retired 12/31/2016

In your code version that creates an array temporary, if you declare a local variable to point at the array section and then pass that instead does that fix the issue? If so then maybe the temporary array creation or rollback is going wrong.

real(kind=DOUBLE), pointer :: arry(:)
...
...
arry => rsk(:,kk)
Emax(s) = Emax_hat(PI_COEF, arry, ExperTomorrow, s)

Quoting - Andrew Smith
In your code version that creates an array temporary, if you declare a local variable to point at the array section and then pass that instead does that fix the issue? If so then maybe the temporary array creation or rollback is going wrong.

real(kind=DOUBLE), pointer :: arry(:)
...
...
arry => rsk(:,kk)
Emax(s) = Emax_hat(PI_COEF, arry, ExperTomorrow, s)

Thank you for your post, Andrew.
The code I posted already solved the "array temporary thing"... The first argument was (in another version) a non contiguous array, but in this version of the code PI_COEF is contiguous. With this fix, the warning disappeared, but the original problem persisted.

Quoting - rafadix08
Thank you for your post, Andrew.
The code I posted already solved the "array temporary thing"... The first argument was (in another version) a non contiguous array, but in this version of the code PI_COEF is contiguous. With this fix, the warning disappeared, but the original problem persisted.

Hi again,

I was able to make the program work using the debug option. What other options does the debug option activate that might be explaining this difference in behavior?

Here is my compile file (without debug):

ifort Global_Data.f90 LinReg_MOD.f90 Parallel_Emax_MOD.f90 Main.f90 -o estimation -L$LIBRARY_PATH -I$INCLUDE -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -Wl,--end-group -liomp5 -lpthread -heap-arrays -warn interfaces -gen-interfaces

Here is my compile file with debug (only difference is the -debug option in the end):
ifort Global_Data.f90 LinReg_MOD.f90 Parallel_Emax_MOD.f90 Main.f90 -o estimation -L$LIBRARY_PATH -I$INCLUDE -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -Wl,--end-group -liomp5 -lpthread -heap-arrays -warn interfaces -gen-interfaces -debug

Just FYI, I have extensively checked consistency of all arguments (types and dimensions)...

Well, setting "debug" turns off optimizations.

You could try your original compile line with -O0

You could try the -debug compile line with -O2

That might give interesting results.

Quoting - Lorri Menard (Intel)

Well, setting "debug" turns off optimizations.

You could try your original compile line with -O0

You could try the -debug compile line with -O2

That might give interesting results.

The original compile with -O0 makes the code work.
With -debug together with -O2 it doesn't.

Still looking for what could be causing the problem.

Quoting - rafadix08

The original compile with -O0 makes the code work.
With -debug together with -O2 it doesn't.

Still looking for what could be causing the problem.

New piece of information:

Compiling like that:
ifort Global_Data.f90 LinReg_MOD.f90 Parallel_Emax_MOD.f90 Main.f90 -o estimation -L$LIBRARY_PATH -I$INCLUDE -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -Wl,--end-group -liomp5 -lpthread -heap-arrays -warn interfaces -gen-interface

Produces the error I have been talking about.

However, adding -check all or just -check bounds "removes the error"
Not sure how -check interacts with optimization (if there is any interaction... didn't see it in the manual) but I added -check all -O2 just for peace of mind. The code still works.

So summarizing: the code works with -O0 (that's what I said in the previous post) and does not with -O2. However, the code WORKS with -O2 and -check bounds and/or -check all.

Still working on it... Still lost!

Ah! Almost forgot...
If I compile with "-diag enable sc" I have the following errors:
LinReg_MOD.f90(58): error #12171: dereference of NULL pointer "Y_HAT" set at (file:LinReg_MOD.f90 line:5)
LinReg_MOD.f90(68): error #12171: dereference of NULL pointer "DISP" set at (file:LinReg_MOD.f90 line:5)

However, I am not sure how to interpret that since both Y_hat and disp are local arrays with dimension given by one of the arguments of the function they belong to. The lines of the errors correspond to the initialization of Y_hat and disp.

I THINK I read somewhere that -diag-enable sc produces garbage sometimes... Just wondering whether I should pay attention to this or not...

I would ignore those messages.

Retired 12/31/2016

Hi Steve,

I just gave up looking for the bug in my program. I spent9 days now on it and was unable to find the problem. I am attaching the code I earlier sent you but with some print outs that will tell you what to look at, followed by a pause. I would greatly appreciate if you could look into it to see what is going on. Again, I understand your time constraints, so please do it at your convenience.

I am sorry I am a bit anxious, butthis is crucial for my PhD thesis work. I can't go on without it.

Basically, for each line, XR must be equal to log(rsk)**2. I am not having that for some unknown reasons.

To help you in the process here is a summary of stuff I learnt debugging it:
1) The code works fine if I compile using IVF version 11.1.048 together with Microsoft Visual Studio 2008 on my windows machine. I tried to change many of the options to make the code fail (optimization, check bounds, etc...), but the code ALWAYS worked on my Windows machine.

2) Although one of the files is called Parallel_Emax_MOD.f90, there is nothing parallel in it. I removed all parallelizations in order to focus on the origin of the problem.

3) The code fails when I compile and execute it on a Linux machine. Here is my command line:
ifort Global_Data.f90 LinReg_MOD.f90 Parallel_Emax_MOD.f90 Main.f90 -o estimation -L$LIBRARY_PATH -I$INCLUDE -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -Wl,--end-group -liomp5 -lpthread -heap-arrays -warn interfaces -gen-interface

4) The code works on Linux if I add -O0

5) The code does not work on Linux if I add -debug -O2

6) The code works if I add ONLY -check bounds or just -check to the line in 3)

UPDATE:
7) This is quite random, but if insert a couple of "print*" around the code, especially after the assignments that go wrong, that is:
XR(14) = log(rsk(1))**2
print*
XR(15) = log(rsk(2))**2
print*
XR(16) = log(rsk(3))**2
print*
XR(17) = log(rsk(4))**2
print*

Thecode works... Could I be facing a compiler bug? This behaviorsounds very random to me...

The trouble is I need to run it on the Linux machine since I will need to perform some heavy parallelization.

Many, many thanks for all the help.

Rafael

Attachments: 

AttachmentSize
Downloadapplication/zip Intel.zip6.8 KB

Hi Rafael,
I investigated your problem at Steve's suggestion, and it does indeed appear to be a compiler optimization bug on Linux only. I shall pass a small reproducer along to the compiler developers to investigate further.
In the meantime, the simplest, safest way for you to proceed would be to insert a compiler directive
!DIR$ NOOPTIMIZE immediately after the FUNCTION EMAX_HAT statement. That will prevent this function from being optimized, but other functions within the file will still get optimized.

We'll let you know if we have further news or advice.

Martyn

Quoting - Martyn Corden (Intel)

Hi Rafael,
I investigated your problem at Steve's suggestion, and it does indeed appear to be a compiler optimization bug on Linux only. I shall pass a small reproducer along to the compiler developers to investigate further.
In the meantime, the simplest, safest way for you to proceed would be to insert a compiler directive
!DIR$ NOOPTIMIZE immediately after the FUNCTION EMAX_HAT statement. That will prevent this function from being optimized, but other functions within the file will still get optimized.

We'll let you know if we have further news or advice.

Martyn

Martyn,
Thank you very much for looking into it. Ifeel relieved that it was not a programming error.
Is there any chance I can be notified of the result of the investigation and/or intel update (if any)?
Thank you also to Steve and to the other members who tried to help. Really appreciate it.
Rafael

Quoting - rafadix08

Martyn,
Thank you very much for looking into it. Ifeel relieved that it was not a programming error.
Is there any chance I can be notified of the result of the investigation and/or intel update (if any)?
Thank you also to Steve and to the other members who tried to help. Really appreciate it.
Rafael

Last question: if there is a bug in the optimizer, why should I disable the optimizer only for Emax_hat? How can I be sure that the rest of the code is being correctly compiled?

Dear Members of the Intel team,

I tried Martyn's suggestion of including !DIR$ NOOPTIMIZE after the Emax_hat function, which was where I was having trouble. The use of this directive appears to be avoiding the problem I was having, however I use the Emax_hat function VERYintensively and the speed of my code decreased by a factor of 4, which makes my program almost infeasible (it was already taking too long). Please notethat the code I sent for analysis was one where I reduced my code to the minimum in order to illustrate my problem.

I would like to ask some questions about this compiler optimization bug, since the answers for these will help medecide whether I have to switch compilers or not.

1) If there is indeed a compiler optimization bug, why should I disable it only for Emax_hat, which is anEXTREMELY SIMPLEfunction? How can I trust the compiler optimization is working properly for the other routines?

2) How longshould I expect this bug to be corrected and a new version of the Intel compiler to be issued?

3) Are there other ways to improve performance of Emax_hat?

Also, please let me know if you happen to have any suggestions.

Thank you,
Rafael

Optimizer bugs are usually very specific to particular code and not something general. If it was general, we'd see it in the extensive testing we do and we'd hear about it from many customers. I would take Martyn's advice here.

It is too soon to know when the bug will be fixed. The next opportunity will be the mid-late January update.

Retired 12/31/2016
Best Reply

Rafael,
The problem was related to a specific optimization involving both the square of a math function and the rerolling of statements involving consecutive array elements to reconstitute a loop. The loop index was getting incremented twice, which led to the pattern noted above where the second element, XR(15), contained the result that should have been in the third element, XR(16). This will be fixed in a future compiler update.
There are, therefore, additional ways in which you could work around this, without reducing the optimization level. You could rewrite the four assignment statements as a loop; then, the compiler would not need to recreate a loop. Or, as you already noted, you could separate the calculation of the logarithms from the calculation of the squares. The former is probably the most elegant: using array notation,
XR(14:17) = log(rsk(1:4))**2
but you'd need to check that you don't have a similar construct anywhere else in your code.

In reply to your last question, an optimizing compiler is a very large and complex piece of software. Bugs are rare, but they do happen. The Intel compiler is run through a very extensive test suite, so any problems are usually only for a very specific set of circumstances, for example, involving the interplay between different optimizations, as here. When a problem is found, a corresponding test is added to the test suite, to ensure that similar problems don't recur in the future.
So whilst you shouldn't expect problems with the rest of your code, provided you check for recurrences of the exact same construct, it is good practice to compare results compiled with optimization against results when compiling without optimization, just as you would check results for a problem with a known solution when testing your own code. It becomes even more important to test and compare to a validated set of results once you begin writing parallel code.

Martyn

Quoting - Martyn Corden (Intel)

Rafael,
The problem was related to a specific optimization involving both the square of a math function and the rerolling of statements involving consecutive array elements to reconstitute a loop. The loop index was getting incremented twice, which led to the pattern noted above where the second element, XR(15), contained the result that should have been in the third element, XR(16). This will be fixed in a future compiler update.
There are, therefore, additional ways in which you could work around this, without reducing the optimization level. You could rewrite the four assignment statements as a loop; then, the compiler would not need to recreate a loop. Or, as you already noted, you could separate the calculation of the logarithms from the calculation of the squares. The former is probably the most elegant: using array notation,
XR(14:17) = log(rsk(1:4))**2
but you'd need to check that you don't have a similar construct anywhere else in your code.

In reply to your last question, an optimizing compiler is a very large and complex piece of software. Bugs are rare, but they do happen. The Intel compiler is run through a very extensive test suite, so any problems are usually only for a very specific set of circumstances, for example, involving the interplay between different optimizations, as here. When a problem is found, a corresponding test is added to the test suite, to ensure that similar problems don't recur in the future.
So whilst you shouldn't expect problems with the rest of your code, provided you check for recurrences of the exact same construct, it is good practice to compare results compiled with optimization against results when compiling without optimization, just as you would check results for a problem with a known solution when testing your own code. It becomes even more important to test and compare to a validated set of results once you begin writing parallel code.

Martyn

Martyn,

Many thanks for the detailed reply.

I will try to follow your suggestion of working around this without reducing the optimization level. I really can't afford this deterioration in performance.

However, I still have some very specific questionsrelated tothis issue. Please let me know if the best way is to keep exchanging messages via the forum, by email or via private messages. Is there a way to exchange private messages with the Intel group?

Many thanks again,
Rafael

Rafael,

As a free, non-commercial license customer, these forums are what are available to you. Customers who purchase a license with support have access to Intel Premier Support.

Retired 12/31/2016

Quoting - Steve Lionel (Intel)

Rafael,

As a free, non-commercial license customer, these forums are what are available to you. Customers who purchase a license with support have access to Intel Premier Support.

Steve,

I actually have access to intel premier support for my IVF w/ IMSL for Windows. I am registered as rafadix (not rafadix08). It's just that the process to submit issues seems a bit complicated, so I prefered to use the forum.

Anyway, I would be grateful if Martyn could look at the following question:

I am a bit confused since in module Parallel_Emax_MOD.f90 there is a subroutine called Emax_Coef with assignments very similar to the ones inside Emax_hat:
X(kk,14) = (log(rsk(1))**2
X(kk,15) = (log(rsk(2))**2
X(kk,16) = (log(rsk(3))**2
X(kk,17) = (log(rsk(4))**2
However, no problem was detected in Emax_Coef... Why is that?

Also, there is additional thing that made me confused.
Remeber the assignments inside Emax_hat are:
XR(14) = (log(rsk(1))**2
XR(15) = (log(rsk(2))**2
XR(16) = (log(rsk(3))**2
XR(17) = (log(rsk(4))**2

That's when the error was detected. However, if I compile like that:
XR(14) = (log(rsk(1))**2
print*
XR(15) = (log(rsk(2))**2
print*
XR(16) = (log(rsk(3))**2
print*
XR(17) = (log(rsk(4))**2
print*

The code works... The assignments are correctly done... This is curious

Anyway, I did follow your suggestion and wrote something like that (avoiding the composition of log with **2):
XR(14:17) = log(rsk(1:4)) ; XR(14:17) = XR(14:17)**2
And that seems to be working - same results with and without optimization. So thank you very much for your help. I am just curious about why the above pieces of code worked. That may be useful for me in the future if I want to avoid this bug.

Many thanks,
Rafael

Rafael,
The compiler was trying to reconstitute a loop, in order to optimize it.
In your example with the 2D array, it is the second subscript that is varying - the 4 elements of X are widely separated in memory. Optimization is most effective when the loop accesses contiguous memory locations. (That's why in Fortranyou should try to write loops over the first array index, and the inner loop of a nest should normally be over the first index). In this case, the compiler won't try to reconstitute a loop, since it's unlikely to be able to optimize it. Similarly, the print statements are effectively function calls, which would also prevent many loop optimizations.
It also depends on the context, but you might encounter the original problem if the 2D subscripts were in the opposite order: X(14,KK) =, x(15,KK) = ,etc., because these 2D array elements would still be adjacent in memory, so rerolling a loop might be worthwhile.

Martyn

Pages

Leave a Comment

Please sign in to add a comment. Not a member? Join today