When a run-time error occurs the debugger is not stopping on a Fortran code line

When a run-time error occurs the debugger is not stopping on a Fortran code line

When there is a run time error while I am debugging my application via Visual Studio 2013, I expect that the debugger should stop on the line that is causing the problem. However, the debugger unexpectedly stops on a C++ code line in another part of our application.

The details of what is occuring:

1. a traceback dialog box titled "Intel(r) Visual Fortran run-time error" appears. It has correct traceback source code and line information about where the error happens in the Fortran code. The dialog has a single "OK" button.
2. I click "OK"
3. Visual Studio produces a dialog box indicating that my application has triggered a breakpoint. It has two buttons, "Break" and "Continue"
4. I click "Break"
5. The debugger breaks in our C++ code in another thread.
6. I navigate the through all of the threads in the thread window and observe the call stack window for each thread to try and locate the Fortran thread, but it is already terminated

What should should I do to signal the debugger to stop on the Fortran source code line instead of producing the run-time error dialog box with the traceback information? 

In VS2005 the behavior was for the debugger to stop in the Fortran source code and then when trying to continue the traceback information dialog box would be produced.
  
Our application is a C++ executable that calls Fortran dlls.

The Fortran compiler switches are as follows:

/nologo /debug:full /MP /Od /free /warn:interfaces /iface:cvf /module:"Debug\\" /object:"Debug\\" /Fd"Debug\vc120.pdb" /traceback /check:all /libs:dll /threads /dbglibs /c

The linker switches are as follows:

/OUT:"Debug\acerate.dll" /INCREMENTAL:NO /NOLOGO /MANIFEST /MANIFESTFILE:"acerate.dll.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /DEBUG /PDB:"acerate.pdb" /SUBSYSTEM:WINDOWS /IMPLIB:"acerate.lib" /DLL HTRILogging.lib

My system info:

Microsoft Visual Studio Premium 2013
Version 12.0.30110.00 Update 1
Microsoft .NET Framework
Version 4.5.50938

Installed Version: Premium

Intel(R) Visual Fortran Composer XE 2013 SP1 Update 2 Integration for Microsoft Visual Studio* 2013, 14.0.0086.12

25 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi,    This isn't my field of expertise, but a few comments, since no one else has made any:

Which RTL error are you seeing? Are you able to set and break at breakpoints in Fortran code at all? How about breaking on floating-point exceptions, where there's no message from the RTL?

There is a section "Error Handling" in the Fortran Users Guide, with a subsection "Advanced Exception and Termination Handling", which might be helpful. There are RTL functions SIGNAL  and  SIGNALQQ, that allow changing how signals are handled. There's an environment variable, FOR_IGNORE_EXCEPTIONS, which when set to 'true'  suppresses exception handling by the Fortran RTL. There's a command line switch -fexceptions, that should be set in mixed language applications to make C++ exceptions work across Fortran calls, though that doesn't seem relevant here.

MSVS 2013 support has only just now been introduced in the 14.0 compiler update that was just posted, I haven't personally had experience with it yet. If this is still a problem, I'll try to find someone who has.

what is the actual line/source code that you say you have "correct traceback source code and line information about where the error happens in the Fortran code"??  The line may contain a call to a system/mathematical function/etc so that is why you are seeing the C++ code?

Or is the C++ code you are seeing YOUR code?

That might indicate a memory leak issue>

You might set a breakpoint in the Fortran Code at the location of the error (and to avoid it getting hit many times before being useful be sure to set a condition) that would be a short term solution while seeking a long term solution.

 

Thank you both for responding… I need some guidance! I’ll try to elaborate some more and answer the follow up questions you’ve posed. Again, I appreciate your ideas and comments.

Are you able to set breakpoints in the Fortran code at all?

Yes, I can set and hit breakpoints (with conditions) just fine. I can also watch the values of variables in the debugger when I hit these breakpoints. I can step into, out of, and over subroutines and functions as expected.

How about breaking on floating-point exceptions, where there's no message from the RTL?

No, I cannot yet break on these type of floating-point exceptions.  I would love to be able to control when\how the debugger breaks on floating point exceptions and it is one of the reasons we are investigating upgrading our Fortran compiler and tools; we were not able to accomplish convenient breaking on floating point exceptions in VS2005 and Fortran 9.1. 

It is extremely difficult to backtrack NaNs and infinities in complex algorithms – it’d be great if the debugger would halt on the first NaN calculated. However, there are some hoops to jump through for setting the floating point exception handlers to get this behavior to occur when the main program is not Fortran. So unfortunately the /fpe:0 option is not sufficient to get the behavior we want for our application. However, I am guessing the breaking floating point exceptions issue is different from the problem I am having now.

Which run time library (RTL) error are you seeing?

It appears to me every Fortran RTL error does not break into the debugger. An example of a Fortran RTL error that isn’t causing the debugger to stop on the offending Fortran code line is error  forrtl: severe (408) fort: (3) Subscript #1 of the array MyArray has value 0 which is less than the lower bound of 1.

What is the actual line/source code that has traceback information about the Fortran error?

My code is like something like this:

In the c++ main process a thread is created to run the calculations

status = runCalcsInThread();

In runCalcsInThread(), my calculation dll entry point is called. The calculation dll is written in Fortran.

In the calculation dll the code looks something like this:

Subroutine EntryPoint()
   {line 1:} Call A
End Subroutine EntryPoint
Subroutine A()
   {line 1:} index = 0
   {line 2:} MyArray(index) = value
End Subroutine A

I can set and hit the breakpoint at line 2 in subroutine A. When I step over line 2 I expect the debugger to stop and signal that there is a problem (because an index of zero does not exist). Instead a Fortran run time error dialog box with traceback information is produced. The traceback information looks like this:

Image                PC               Routine            Line        Source
calculation.dll     {hex number}      _A@0                2           A.f90
calculation.dll     {hex number}      _EntryPoint@0       1           EntryPoint.f90
main.dll            {hex number}      Unknown             Unknown     Unknown

If I break program execution in the debugger at this point (with the traceback dialog box visible), then the debugger breaks in my c++ code on 

 status = runCalcsInThread();

If I bring up the Threads window and click through each thread while also looking at the Call Stack window, I observe that none of the threads have a Call Stack that includes the offending Fortran code. I do see a libifcoremdd.dll thread, but I think this is the thread that spawned the dialog box and it does not refer back to subroutine A.

In VS2005 the behavior was for the debugger to step over line 2 of subroutine A and generate the RTL dialog box with traceback information. When breaking program execution at this point (before clicking OK on the dialog) I view the Threads window. One of the thread’s location indicates subroutine A. If I click that thread and look at the call stack I can see the Fortran call stack and navigate the call tree and observe the state of the variables that is causing the error. If I double click the top of the call stack (which in this example is subroutine A), the VS2005 debugger takes me to the offending line 2, and I can see that index is zero.

Review of Fortran Users Guide

Thank you for pointing me to the reference manual. I read through Intel® Fortran Compiler XE 13.1 User and Reference Guides\Compiler References\Error Handling\Handling Run-Time Errors (http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/composerxe/compiler/fortran-mac/).

In the near future we may need to override the default signal handling (I think that’ may be what we will need to do to get the floating point exceptions to behave like we want), but I’m not quite sure this issue should need the default handling to be changed. 

Our application is compiled on Windows so the –fexceptions switch is ignored.

I do not have the FOR_IGNORE_EXCEPTIONS environment variable defined, so I think the RTL exceptions are working according to the default behavior.

I’d appreciate any other thoughts/ideas you may have!

Let me make the easy comments first. It's true that /fpe:0 only works when you compile the main (Fortran) routine with it  - a call to set the floating-point control word to unmask certain floating-point exceptions is placed in the main routine. However, current compilers support /fpe-all:0, which applies to individual subroutines/compilation units. This was probably not available in the 9.1 compiler. Note that you could also compile your C/C++ main with /Qfp-trap:common  to unmask floating-point exceptions for the whole program This gives you more control over individual exceptions that /fpe:0.   There's also /Qfp-trap-all for individual C/C++ compilation units. Finally, the IEEE_EXCEPTIONS module from Fortran 2003 allows you to mask and unmask floating-point exceptions at run-time. So I don't think you should need to write your own handlers to debug floating-point exceptions.

For the immediate issue, I'll ask around. Maybe we'll need to construct a reproducer, and run it under different MSVS versions. Thanks for the additional detail.

 

couple things you left out: where is MyArray declared? is it in a module? a pointer? allocatable?

the break does point you to the correct line, and i am wondering that it may be throwing the trace due to the way you declare/store that array

Would you please attach a screenshot showing the whole screen (including desktop) when the error occurs? Or at least the Visual Studio window.

Because you are using threads, you may need to switch contexts to the thread where the error occurred.  http://msdn.microsoft.com/en-us/library/ms164746.aspx has some basic instructions.

Steve - Intel Developer Support

Martyn, the /fpe-all:0 compiler switch works! Holyhell. The code halts on the proper line when a floating point exceptions occurs. It seems some parts of our code slow way down with this switch enabled, so I think we've some refactoring to do in those bits of code. Anyway, we owe you a fruit basket or something.

bmchenry and Steve, I have attached a Word document with screen captures of the actual problem in practice to better describe what is going on.  Steve, you are correct about switching contexts. After breaking on the run time error, the debugger stops in the main thread. I am familiar with how to switch contexts to the thread with the Fortran code (I do that often in VS2005). The issue I am having is that even when I switch to the Fortran thread ID, the call stack seems to be just the call stack that creates and shows the run time error dialog box. The Word document elaborates on context switching.

If there's any more information I can provide, I'll happily do so. Before I can recommend upgrading our toolset we will need to be able to break in the code as expected when a run time error occurs.

Attachments: 

Nathan,

Thanks for the Word document. Here's what's happening. When the Fortran code wants to report an array bounds error, it calls a run-time library routine to do so. The RTL routine, in libifcoremdd.dll, displays the error message, emits the traceback, and then calls DebugBreak, a Windows API that actually does the break. But it is inside the RTL when this happens so that's the active stack frame for which you don't have source code. All you should need to do is click on the stack frame dropdown and select the first frame that has your actual code.

In a non-threaded application, the debugger tends to show you the first calling frame that has debug info, but I guess it doesn't do that when you have to switch threads.

Steve - Intel Developer Support

Steve, perhaps I am missing something. Is the "stack frame dropdown" the same as the list of stack frames in the call stack window?

After breaking, I do not see any stack frames in any of the threads that have my offending code. In the word document if you look at Figure 4, the stack frames for 8680 (the thread where Fortran code is executed) is shown on the right side in the Call Stack window. None of these stack frames contain my code. If I choose the dropdown in the Location column of the Threads window, I see exactly what is shown in the Call Stack window. Is there another "Active Stack Frame" dropdown I am missing?

When I cycle through switching context to all the other threads, I do not see my offending code in any of their stack frames either.

I was referring to the dropdown I see in your screenshot here. I admit that I have not used threads in the debugger this way so I am not sure if this will help.

Attachments: 

AttachmentSize
Download Capture_9.PNG22.27 KB
Steve - Intel Developer Support

Selecting that dropdown shows the same stack frames that are listed in the Call Stack window. None of these stack frames refer to the offending code unfortunately. 

Could you attach a ZIP of a small reproducer example for me to look at?

Steve - Intel Developer Support

I will work on a reproducer. I have attached a screenshot of what happens in VS2005, Fortran 9.1 for Figure 4 in the word document. This is the call stack I'm expecting to see in VS2013.

Attachments: 

AttachmentSize
Download CallStackVS2005.png521.24 KB

Great to hear that /fpe-all:0 worked :-)   There's presumably a call to set and then unset the FP control word at entry to and exit from all your routines compiled with /fpe-all:0, so with a lot of subroutine calls, that could be a significant overhead. (Which might not matter if you only compile like this for debugging). One alternative to avoid all those calls would be to USE IEEE_EXCEPTIONS and CALL IEEE_SET_HALTING_MODE(...)  at entry to your DLL. Another might be to compile your C++ main with /Qfp-trap:common as above.

Martyn, I will definitely try the other floating point exception handling options you mention and see what works best... but what an awesome relief it is to be able to halt on the first exception that occurs instead of having to work backwards (for days)!

Steve, I have uploaded a small reproducer. 

The c++ console application uses LoadLibrary() and GetProcAddress() to gain access to the Fortran Dll's entry point. It then uses std::thread to start a new thread for the entry point subroutine. If you build and run the solution, you should get an array subscript run time error and correct traceback information in the console window. The debugger will ask you if you want to break or continue. If you choose "Break" the debugger will stop but you will not be able to see the Fortran call stack that would allow you to navigate to the code that is having the problem.

Attachments: 

AttachmentSize
Download C_FortranConsoleApp_0.zip5.9 KB

I read through the error handling section and found the DECFORT_DUMP_FLAG setting.

I call l=SETENVQQ('DECFORT_DUMP_FLAG ') in my code. When a run-time error occurs: Where can I find the dump file and what is it´s name?

 

Markus

I think that's a Linux thing - I'll ask.

Steve - Intel Developer Support

Indeed, that environment variable has no useful purpose on Windows. It does cause the run-time library to call abort(), but that's not very helpful. On Linux and OS X, it does create a core dump that can be debugged with gdb.

Steve - Intel Developer Support

Steve, I'm extremely curious... were you able to reproduce this issue with the debugger not locating the offending code when a runtime error occurs (using the reproducer project attached in Quote #16)?

I have not gotten to that yet - and it may yet be a while. But you've introduced a bunch more variables into the equation, what with mixed-language and LoadLibrary.  What I usually find with applications that do dynamic loading is that I have to start the application from within the DLL project, setting the commands in the Debug property page to run the executable. You also need to make sure that the DLL you're loading is the same one as the project produced.

Steve - Intel Developer Support

I understand that it may be a while before I hear back on this issue. I do appreciate the feedback though; I'm expected to report on this issue tomorrow because it is one the issues preventing us from migrating forward.

I will try to eliminate variables to see if I can narrow down further where the problem occurs. I will report on what I find so I am not forgotten.

The reproducer project is the simplest version of how our software works: a c++ executable that dynamically loads and runs Fortran code in a new thread. I think the reproducer demonstrates there is in fact a problem and that I am not insane. Unless someone else cannot reproduce the problem.

I have had similar experiences dealing with incorrect dlls being loaded for a debugging session, so I have learned to verify that the correct libraries are loaded. Fortunately Visual Studio writes which dlls are loaded into the Output Window so it is easy to verify. In this reproducer case I see the correct dll and its symbols being loaded and I can debug through the c++ application and into the Fortran code okay.

In an effort to simplify the problem, I have tried debugging a single-threaded application instead of a multi-threaded one: a c++ executable, dynamically loading the Fortran dll, and running the Fortran single-threaded. In this scenario I found that I intermittently had the issue - sometimes I could see the Fortran call stack to the offending code correctly and sometimes I could not. It seemed that the debugger could get into a state where it would not produce the Fortran call stack to the offending code generating the runtime error. The debugger would stay in the state of not being able to show the Fortran call stack until I restarted Visual Studio and started a new debugging session. While I felt that debugging the run time error was flaky in this scenario, I was not able to figure out how to reliably reproduce the problem in a single-threaded mixed language application that dynamically loads a Fortran dll with a run time error.

Ok, I just tried your example solution. All I did was a rebuild and then start in the debugger. I got the array bounds error, and the dialog to do a break. I select Break, I am put into the Fortran code with the call stack in place. The Locals window has the correct variables. I don't see any problem. See screenshot attached.

Attachments: 

AttachmentSize
Download U505627.png55.58 KB
Steve - Intel Developer Support

That screenshot is exactly what I am expecting to see! Thank you very much for taking the time to try to reproduce the problem I am having.

I have attached a screenshot movie of what happens for me. It's probably something specific to my environment, but I'm not sure how to troubleshoot the issue. I will try to reproduce this problem on other development environments/machines. 

Attachments: 

I think that we have narrowed down the reason for this issue -- there appears to be an interaction problem that occurs when certain symbols are loaded. I have attached a screen capture movie that demonstrates reproducing the problem using the reproducer solution/project attached to Quote #16.

Essentially, when I choose to load Microsoft Symbols for a debug session, then it appears I lose the ability of the debugger to halt in the offending code on a Fortran runtime error. I was able to reproduce this problem in both VS2013 and VS2012 with Intel Visual Fortran Composer XE 2013 SP1 Update 2 Integration for Microsoft Visual Studio* 2013, 14.0.0086.12.

So... if you happen to run into this problem, empty the symbol cache and load only specified symbols to track down run time errors in the Fortran code.

I'd be interested to learn if anyone else can reproduce or observes this problem (because I've invested a significant amount of time in it) and also if it is an issue that will be resolved. I do not often need to load Microsoft Symbols, but I'd like to have the option without compromising other debugging features! 

 

 

Attachments: 

Leave a Comment

Please sign in to add a comment. Not a member? Join today