why do write statements fix crashes?

why do write statements fix crashes?

Hi.

I have noticed since i first migrated to the intel compiler several years ago (from the compaq compiler) that my results seem slightly unstable.

I am noticing some strange behavior in composer 2013 however.  Here is what i'm running:

Microsoft Visual Studio 2008
Version 9.0.30729.1 SP
Microsoft .NET Framework
Version 3.5 SP1

Intel(R) Visual Fortran Package ID: w_fcompxe_2013.0.089

Intel(R) Visual Fortran Composer XE 2013 Integration for Microsoft Visual Studio* 2008, 13.0.3588.2008, Copyright (C) 2002-2012 Intel Corporation
* Other names and brands may be claimed as the property of others.

.

.

Right now my program runs fine in debug mode but when i try to run it in release mode (with full optimization) i am getting some strange crashes.

They are all "forrtl: severe (157): Program Exception - access violation".  I can change when this crash occurs by inserting write statements into my code and can even eliminate it altogether.

(Since the visual studio debugger is reporting the wrong values of variables while debugging, i've been using a lot of write statements.)

After crashing, i put write statements in to surround the problem line like so:

WRITE(*,*) 'aa'

LRactual = NRM2(rtoCP_RFP) ! THIS LINE IS REPORTED TO CAUSE THE CRASH

WRITE(*,*) 'bb'

.

Then i run again and a different line causes the same crash, so i insert more write statements surrounding the new problem.

Then the program runs with no problems.

It all seems a bit random and chaotic, but i can repeat the same behavior when i remove/reinsert the write statements.

.

I have attached a word document with the crash messages.

Sounds like this error could be due to an array index out of range, which wouldn't surprise me if this were the case somewhere.  I don't know how to track it down though with the unpredictable behavior.

Any ideas?

thanks,

rob

AttachmentSize
Downloadapplication/vnd.openxmlformats-officedocument.wordprocessingml.document crashreport.docx150.18 KB
14 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

I have uninstalled 2013 and installed 2011 update 12 and so far have not been able to reproduce the crashing.
Unfortunately VS still reports incorrect values in watch.
rob

Try compiling in Debug Build with full runtime checks as well as gen/warn interfaces. This may expose a problem.

Jim Dempsey

www.quickthreadprogramming.com

Hi,
I agree with Jim.
I have experienced the same strange behavior recently and finally it was caused by an attempt to access an unallocated memory location.
In my case the debug version of the program was running but not the release one. Under Visual Studio both where running. So before adding the checks in the compiler settings, I have tried to add write statements to find the location of the problem (because it was not crashing in debug mode). Once these statements have been added the crash disappear.
Phil.

What would be a good set of compiler options to use for bound checking on array arguments, as I think this is the source of this problem ?
.
I'm sure that the source of this error is an array subscript that has not been defined.
This can occur in a failed validity test when you expect the loop not to be executed.
The following loop could fail for i <= 4 but may have worked for years with another compiler.
Do i = 1,n
if (i > 4) num = fn(i)
if (num < 1) cycle
.
John

Thanks for the responses guys.
Jim, I have always had set full run time checks and compile time diagnostics: show all (see attached screen shots).

Attachments: 

RE: LRactual = NRM2(rtoCP_RFP) ! THIS LINE IS REPORTED TO CAUSE THE CRASH

What can happen here is NRM2 is a dummy argument (reference to array) passed on stack. And somewhere between the start of the subroutine and above statement, a call is made or statement issued whereby the reference pointer is bunged up. These types of errors
are hard to detect since adding or removing code alters the symptoms.
A similar situation can occur with your index getting blown. The following may assist you

subroutine YourSubroutine(...,NRM2,...)
...
INTEGER(C_PTR) :: NRM2loc
...
NRM2loc = LOC(NRM2) ! at top of subroutine
...
if(NRM2loc .ne. LOC(NRM2)) call YourBugStop()
if(rtoCP_RFP .le. 0) call YourBugStop()
if(rtoCP_RFP .gt. YourWorstCase) call YourBugStop()
LRactual = NRM2(rtoCP_RFP) ! THIS LINE IS REPORTED TO CAUSE THE CRASH

Where YourBugStop() is external to the YourSubroutine(...,NRM2,...) and not suseptible to inlining.
i.e. it may contain a write statement that you do not wish to be inlined and thus remove symptom
You can also copy rtoCP_RFP to a global variable (as well as NRM2loc) which may be easire to locate when breaking on access violation.

Jim Dempsey

www.quickthreadprogramming.com

NRM2 is not an array, it is BLAS function, sorry for the confusion i didn't provide enough code.
rtoCP_RFP is a 3 element array of REAL*8.
In the documentation it says to include "blas.f90" to use this function, however i'm including blas95:
USE BLAS95
I read somewhere to use blas95, so not sure which is right or if it even matters...

Then the NRM2loc lines can be removed.

It is likely that the line number on the error report is not correct (due to compiler optimizations).

Where/what is rtoCP_RFP?

Is it local to the scope of the subroutine?
If so, is it SAVE, is the subroutine compiled with OpenMP or has recursive, or option automatic?
Is it in a module?

Does the crash occur on the 1st call?

In place of WRITE(*,*) 'xx' try inserting

intDebug = nn

Where nn is a sequence number, and intDebug isa global variable that you can find after a crash.
Attribute intDebug with VOLATILE to keep the compiler from reordering the code around intDebug = nn

This will do the same thing as your current write, except it will be less intrusive of compiler optimizations, but you only get to see the last set value.

Are you running Release build (or other optimization build) from VS when the problem occurs?
If so, with that configuration selected (and without trace statements) open the Break Point window, then use the Delete All breakpoints button (do not delete one by one). Exit Break Point window, exit/save solution and VS, restart VS, select and run your application. This might be two minutes of your time. (fishing for reoccurance of old VS bug)

Jim Dempsey

www.quickthreadprogramming.com

Thanks again Jim,
Yeah i'm thinking the line number is wrong also (i have seen this before).
rtoCP_RFP is a local variable defined as:
REAL*8, DIMENSION(3) :: rtoCP_RFP ! vector from r to CP in reference frame path
It is not in a module, it is not SAVE (i have never used SAVE actually), i don't think i've ever done anything with OpenMP, recursive, or automatic.
Yes the crash occurred on the first call.
Yes the crash occurred with the release build (with full optimization), i was not able to reproduce any crashing with the debug build (no optimization).
Here are my command line settings...
Fortran:
/nologo /O3 /Qipo /I"C:\Program Files (x86)\Intel\Composer XE\redist\ia32\mkl" /Qdiag-enable:sc-include /warn:all /module:"Release\\" /object:"Release\\" /Fd"Release\vc90.pdb" /check:bounds /libs:dll /threads /Qmkl:sequential /c
.
Linker:
/OUT:"Release\MAIN.exe" /INCREMENTAL:NO /NOLOGO /LIBPATH:"C:\Program Files (x86)\Intel\Composer XE\redist\ia32\mkl" /LIBPATH:"C:\Program Files (x86)\MATLAB\R2010b\extern\lib\win32\microsoft" /NODEFAULTLIB:"LIBCMTD.lib" /NODEFAULTLIB:"MSVCRTD.lib" /MANIFEST /MANIFESTFILE:"E:\LapSim\oldsolver\redo\MAIN\MAIN\Release\MAIN.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /DEBUG /PDB:"E:\LapSim\oldsolver\redo\MAIN\MAIN\Release\MAIN.pdb" /SUBSYSTEM:CONSOLE /IMPLIB:"E:\LapSim\oldsolver\redo\MAIN\MAIN\Release\MAIN.lib" "libmat.lib" "libmx.lib" "mkl_lapack95.lib" "mkl_blas95.lib" "E:\LapSim\oldsolver\binaries\wrapper_c.obj"
.
I think I will set "Runtime Error Checking" to check:all from now on in the release build.
I have uninstalled 2013 and installed 2011 update12 (which doesn't crash) and i don't have the time to go back to 2013 again but thanks for the advice. If i run into this problem again in any version, i will try that next.
.
On another subject, i've been thinking about updating my visual studio. Is there a preferred version to use with the latest composer? Will composer work ok with VS2012?
.
thanks,
rob

>>It is not in a module, it is not SAVE (i have never used SAVE actually), i don't think i've ever done anything with OpenMP, recursive, or automatic
Then under this circumstance the local array is equivilent to SAVE (one static copy visible only to the subroutine). This will not cause an issue such as you are observing.

You mentioned earlier that have USE BLAS95
My guess is the .MOD file you USE for BLAS95 was not generated with the .LIB file for BLAS95 and one passes/uses the array descriptor and the other does not (passes address of first cell of array). This could be due to library misuse.
Your command line is using mkl_blas95, check to see if MATLAB also contains BLAS95 (entry points). If so, this may be the issue
Note, you are loading libmat.lib prior to mkl_blas95.lib and duplicate entries use first .lib with entry.

Jim Dempsey

www.quickthreadprogramming.com

Yes, i definitely feel like i'm misusing libraries, because i don't know what i'm doing.
I'm not a programmer (though i've been trying to learn these things for years).
Those settings and all the lib/dll files that i'm including are from lots of different sources and i've basically added them in a trial and error method.
So should i put the mkl_blas95.lib first then?
Do you know of any good sources that would explain this? I see that there is some information in the documentation, which i will be reading soon.
Sorry for my ignorance, i should have read first then asked questions.
rob

It would be easy enough to place mkl_blas95.lib before libmat and libmx and see what happens.

FWIW When you link your main + object files this is done in a 1 or n-pass process. After each pass, should there be undefined references, then the libraries are searched in the order in which they are presented on the command line. Thus, if multiple libraries contain the same function/subroutine/lib global variable then the first one is loaded/used. In some cases you will be required to specify ignore duplicates. The link passes continue until no undefined references .OR. until a pass is made without finding at least one match.

References:

Google: mkl blas link

First Intel site hit: http://software.intel.com/en-us/articles/intel-math-kernel-library-intel...

This might be a good starting point.

Second Intel site hit is good too: http://software.intel.com/en-us/articles/intel-math-kernel-library-intel...

Jim Dempsey

www.quickthreadprogramming.com

Thank you very much Jim for the assistance.
rob

Leave a Comment

Please sign in to add a comment. Not a member? Join today