Catastrophic debugger bug

Catastrophic debugger bug

This is painful.

I have a mixed-language program and ran the debugger. One of my Fortran subroutines has several entry points. I wanted to check the value of an input argument to be sure it was the same as the calling program's value. I held the mouse over the variable name, and nothing happened. (Usually, it shows the numeric value.) So I dragged that name into the Watch pane.

The system froze. Nothing responded to anything. Ctrl+Alt+Del did nothing. I tried it lots of times, and then got a blue screen.

Rebooted, and my desktop icons were in all the wrong places. My network did not work. Hours on the phone with three tech support guys, who tried everything. The restore points were corrupted.

They said I had to wipe my disk and reinstall everything. Ugh. Anyone who's used a computer knows bad news when it hears it. It's not nice to scream and weep in public -- but this was definitely the time.

I now have some of my stuff reinstalled, but the network still does not work. I'm not an expert in that, and it will take me several days of dumb trial and error before it works again. Fortunately, I had a backup copy of all my source files, so it could have been worse.

Be advised: if you use the debugger for a Fortran program, you have entered a minefield. Does the programming team know about this?

33 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

I don't think this is "a debugger bug". While I have seen issues with the debugger misbehaving, often when a Windows process called ctfmon.exe is running, you have much bigger problems for which the debugger (and you) is just an innocent victim. There is certainly nothing in the debugger that would be able to cause the amount of system damage you report.

From the symptoms you describe, I would suspect disk corruption, bad memory or perhaps even malware.

Steve - Intel Developer Support

Well, this is about the kind of response I was expecting: "It's not our fault". How do you know? Denying a problem is never the first step in solving it.

I have a brand-new system, very little software installed yet, little on-line time, and a good antivirus program installed (Zone Alarm). I go to the debugger, do a simple task, and it crashes. Sure, it's not your fault.

Have you tried the steps I described? If not, your reply sounds like wishful thinking.

Please do your job and give me a responsible answer.

Infantile mortality of electronic components is not unheard of. In fact, I had a mouse failure a few weeks ago which, before I figured out it as such,was very perplexing and made me do things I should not have.

The "steps I described" are too generic to do anything useful. I have for years used the VS debuggers (VS2005, 2008 and now 2010) to debug Fortran, C (and even assembler) with and without debug-symbols in the EXE. The debugger itself has never failed. Other components, such as the project converter and features such as "create a project from existing code" have not always worked, but never have I seen the debugger cause crashes all by itself.

It is possible to have such a nasty bug in one's own code that the debugger is not capable of catching in time. In such cases, a test case with full source code and instructions for building the EXE and reproducing the bug are nearly indispensable.

Here's a test that is not generic at all.

1. Create a project with C++ the main language
2. Add some Fortran subroutines
3. Add a subroutine with several entry points, each with different arguments.
4. Call one of those entry points
5. look at the value of the incoming data in the debugger. (The variable is passes by reference.)
6. See if the debugger can display the value by holding the mouse over it.
7. If not, try dragging the name into the watch pane.
8. Oh, yes, back up your system first.

I did this sequence (which occurs in my code), twice. The first time, the drag made the system hang, but the task manager was able to stop the VS application. The second time, same place, the drag made the world implode. Your observation "...has never failed" is true but a distraction. It never failed for me with VS6 and CVF either. I'm talking about the time it did fail, not all the times it did not.

If a given tool fails twice doing the same thing, it's a sign that there's something wrong with the tool.

If you do the above tests and everything works, then we have to look elsewhere for the culprit. I would be pleased to learn that the debugger is not at fault, since I use it every day and I can't afford to waste about three days reinstalling everything, looking up license codes, remembering all my settings and preferences, and then waiting for several hours while all of the reinstalled software updates itself. (About a GB of updates, into my DLS connection. Painful.)

But if the culprit is elsewhere, how can I sleep at night knowing what might explode tomorrow? In such a serious situation, it seems to me that the proper course for those denying the problem is first to thoroughly test the debugger. If it passes, then of course you can deny.

I am just another user who, in that capacity, shared his experience with you. I do not write C++ code, but there are others who regularly post here and use C++.

If the caller can be in C rather than C++ and still cause this problem, I'd like to try out a test case.

It would really help to have source code for the pathological case, even if you are forced to reconstruct it from memory (in order to avoid another system crash and HD restore at your end) rather than capture the source code as it was just before the crash.

I have used C++ and Fortran codes intermixed for years
without any of the types of problems you describe. Generally i have a Fortran
project and a c++ project and include them in a solution.
Debugging is no problem in that scenario (and C++ sometimes has more tools
available!)

The only different type of thing you mention is that you 'drag' a variable to
the watch window?
Since C++ has more/different tools/etc perhaps it is dragging something
unexpected to Fortran debug window which wreaks havoc on the system?
I normally do not 'drag' things between applications. I simply copy the
variable name and paste it into the watch window (i expecting that is what you
do too?) (Or I am sure I save things before the drag in case things go horribly
wrong!)
I have had too many lockdowns with the mother of all bugfests Windows Word
where after i drag something (a picture from the web? a picture from another
application?) to Word it locks down the system requiring a hard reboot (unplug
the computer and battery and restart from there!)

I think it is all the
interoperability of processes and something when things get dragged across
applications it locks down the system as the system tries to resolve the
'paste' of the 'something' that was inadvertently included in the 'drag'.
I recommend you select something to be sure you have what you want/need, copy
it and then paste it.
Be wary of dragging things.

And another thing...(and please dont take offense to this however everytime I hear
someone reinstalling the system I think of possibly lazy tech
support!) It seems to be typical response of some tech support departments
these dayz (and INTEL I am not saying you ever do this and have never done this
to me!)

What a waste of time!

How long was it between you started the reinstall and you were
back to the point you were at when something went wrong? days? weeks?
Did you start up in safe mode and/or boot from disk/CD/USB to check the disk for
issues? Didyou remove the disk and attach it to a USB on another computer and scan it for any issues/defects on another computer?

Did you try windows repair?

You may have done all these things. I have heard of many other 'non tech' folks with computers who have a minor issue and then have to go through 'resinstall' (like when the 'Geeks' at certain support places who don't know what they are doing say it needs a resintall!! (my sister and another friend had this situation and i went and rescued the computer for the moronic 'Geeks' , scanned the disk from another computer, removed some issues and voila! it restarted without any problem!)

Hardware issues should NEVER require a reinstall until and
unless you are CERTAIN it is the only way to resolve the issue.
And software issues should be able to be resolved with Windows Repair or other utilities.

Sorry to be so chatty

One other item: I would suggest you rid your code of ENTRY
points. Could they be the source of your issues?

I am surprised that havent been ruled obsolescent like
Alternate Returns??

A simple way to rid your code of them is to either

1)
Simply add an argument to the calling sequence so
all calls go though the main entry point and then route to the formerly entry
points.

OR

2)
Or put all local storage for the subroutine in a
module and include it in the sepearte subroutines you create out of each ENTRY
point.

I have worked with a lot of older code which had many many ENTRY
points and got rid of them all.

The ENTRY feature is, indeed, a problem (perhaps not as destructive as described). Here is a short example, all Fortran, that shows the defect in the debugger.

Using IFort 12.1.3, with /Zi /MD, and running under the VS2010-SP1 debugger, after entry to the main entry of the subroutine I can see the values of arguments a and b in the debugger by hovering the mouse cursor on the variables immediately after subroutine entry. If those variable names are inserted into the watch window, the values are shown correctly. As soon as the variable c acquires a value, its value is also displayed correctly. However, when the secondary ENTRYs are called, the debugger appears not to have the necessary and/or correct debug symbols. In fact, after entering EntryB the debugger shows the value of the first argument, p, nor as the value of p but as the value of the now inactive argument a of the prior call, and does not show the value of r after this argument acquires a value. Only after return to the caller can you see that the result is correct.

The same behavior occurs with IFort 11.1.70. I believe that this is not a consequence of a bug in the debugger itself, but is caused by the failure of the respective Fortran compilers to emit the proper debug symbols.

Curiously, when I used the IFort 7.0 compiler and ran under the VS2010SP1 debugger, everything worked correctly. This observation reinforces the conjecture of the preceding paragraph.

This program, compiled with CVF 6.6 and run under the CVF/VS6 debugger, does not display the problems just described.

program UseEntry

integer :: a,b,c,p,q,r,u,v;
a=3

b=2

call CRASH_N_BURN(a, b, c)

print *,'C = ',C
p=5

q=7

call ENTRYB(p, q, r)

print *,'R = ',R
u=11

call ENTRYC(u,v);

print *, 'V = ',V

end program UseEntry
subroutine CRASH_N_BURN(a,b,c)

implicit none

integer :: a,b,c,p,q,r,u,v

c=a+b

return

entry ENTRYB(p,q,r)

r=p-q

return

entry ENTRYC(u,v)

v=u*u

return

end subroutine crash_n_burn

I agree completely with the assessment that support guys can save their time by simply recommending a fresh install. I had one say exactly that when I had already done the install the day before! "Do it again" he said. Idiot. They are certainly not saving my time.

Before I started over, I did an sfc \scannow. It found no problems. But I'm not as expert as the CS people, so I followed orders. I simply did not know any other way to proceed. And since the restore files were all corrupted, it looked like the damage was pretty widespread -- and not reinstalling could be asking for even more trouble.

It took the better part of three days to get to where I could test code again. Now, I'm not claiming to be totally innocent. Suppose I called my entry pont with a double-precision real variable, but the called program expects a REAL? It's not easy for the compiler engineers to think of every possible screwup like this, so maybe they missed one. But finding coding errors is precisely what the debugger is for, so one would assume they did their job. Also, the structure of the obj and other files depends on which compiler options are selected. There are way too many of those for a dummy like me to make sense of without serious research, and it is possible that some combinations work just fine. I sympathize with the debugger programmers, who have to test all possible screwups with all possible combinations. I'm actually not mad at them, since I can imagine being in their shoes.

I have removed the ENTRY points from the subroutine that crashed. I have a huge legacy code that worked with CVF, and there are tons of other ones elsewhere. It is not practical to remove them all.

I'm glad another poster found a debugger bug. I posted this thread mainly to convince the compiler programmers that their job was not finished yet. Evidently I'm not the only one who thinks so.

Quoting dondilworth...I have a brand-new system, very little software installed yet, little on-line time, and a good antivirus program
installed (Zone Alarm).I go to the debugger, do a simple task, and it crashes...

First thing I would try, when doing arecovery of a "broken" computer system, is uninstall ofantivirus software. Did you try to reproduce
the problem when your system doesn't have the Zone Alarm installed?

Best regards,
Sergey

We know that the debugger does not properly show dummy arguments for ENTRY points. However, that is not to say that using the debugger with ENTRY causes Windows to become corrupted. I will try to reproduce any kind of misbehavior with dragging and dropping.

Steve - Intel Developer Support

I do not think that it was debugger's fault.Your app was executing entirely in user-mode space when misbehaving application's tread will be terminated when its exception handler can not be found.But we can not eliminate the situation when one of the Natiive API function within the call - chain caused an exceptionwhen executing in kernel space.For example calls to display driver when debugger wanted to display some values from the debugged process.
Bear in mind that sometime anti-virus inserts hooks and inline function's prolog patching to intercept the WIN API and Native Api function calls when it is performed unwisely in the kernel space the anti-virus can bring down the system.I have witnessed such a behaviour with Kaspersky AV. Did you save the BSOD crash dump?It couldbe very helpfull to pinpoint the problem.You can opent it with windbg and use command "analyze -v" to inspect crash dump.

This is a very cogent reply, thank you. When I reinstalled everything, of course I also lost everything, so I cannot analyze the crash dump.

I am reluctant to try to reproduce the problem, as you can imagine. I don't play Russian roulette either, and the two are closely related.

I did not know that an antivirus program could get in the way. Just now I have reinstalled Zone Alarm free version, and yesterday I wanted to remove it since I cannot get printer sharing to work now. But it won't uninstall! There should be a law against programs that install themselves in a way that makes it impossible to uninstall them. So I cannot test anything with and without that program running.

BTW: my program was actually not running when the crash occurred. It was halted by the debugger, and I was trying to see the value of a variable. So the debugger was in charge at that moment. My program did not crash. The answers were screwy, which is why I wanted to diagnose things. The crash occurred after dragging, as explained above. Does that narrow things down?

An important suggestion/recommendation:
Since your troubles seem to indictate thet you are not connected to a server and therefore do not have daily/weekly/monthly full backups of your system...
might i suggest...
1) buy a USB external drive, they go for $120 for 1 tB
2) get some backup software, i use Acronis True Image, $50?
3) as a minimum once a week or before adding anything major back up your computer.
The windows backup/restore is a bear and it appears didn't work for you.
doing a full image backup permits you to fully restore your computer if things hit the fan.

As far as Zone Alarm, i have and do use Zonealarm without incident.
I do not use the freebie version.
Go to their forum and ask them how to un-install the freebie version.
it shouldn't be a problem.
you can also simply turn the program off (no anti-virus, no program control, etc)

brian

dondilworth
Crash dump file would have been very helpful in your situation or even the BSOD stop code.Without this it is almost impossible to track down and findthe problem.
As I stated earlier in my post AV software by its design can sometimes bring down the system.Beacuse it implements kernel modules which patche various crirtical system structures like IDT table SSDT table or they install filter drivers above function drivers to intercept IRP's flow.Even mouse can be hooked by SetWindowsHookEx() in the user mode or by filter driver or even by IDT handler for mouse or keyboard.This explains the mouse event generated by moving your mouse it can be intercepted and tracked.
Regarding the uninstallin problem bear in mind that even here AV can place so called IAT hooks in msi.dll dll
which is responsible for installing/uninstalling apps.
Try to reproduce the problem even on virtual machine because in order to understand what has happend we need to see debugger's kernel-mode stack.

In terms of backup, a pair of hard drives in raid mode used for data only has saved me on several occasions when the OS took a nose dive.

JMN

raid mode is great but does you 'data only' include ALL system data?
"data only" (meaning your data and not all programs, settings, etc) doesn't cure the issue that if Windows decides to self implode, "data only" won't restore the system.
I recommend full and incremental backups of the entire system with regularity.
First for data integrety and then so you can step back a day or so in the event a virus or antivirus software or system issue brings down the drive/system.
now having said all this my system will probably self-immolate aka crash n burn just to demonstrate that my system isn't properly archived! ahhhh...the joys of modern computers.

some may also recommend backing up 'to the cloud' but that reminds me too much of the old time share dayz which i thought PC's got us away from?
current wireless is mucho faster than 300/1200 baud but of course the amount of programs, setting and data needing to be backed up is exponentially larger too!

Perhaps this has run afoul of the initial topic of this thread..then again, maybe not since the original author could have avoided a lot of wasted time on reinstallation, etc with proper and regular backups!

I have to agree with you. I would actually go so far as to recommend to regularly (as in, once a year) format your HD. Not only does this preserve the performance of your computer, getting rid of useless software and/or files that might have piled up in the meantime (including malware and such), but you will also be forced to back up at regular intervals.
As for cloud vs. external HD, I wouldn't know. I think cloud is fairly reliable these days - when in doubt, I would probably use a mix of the two.

I tried the exact set of tasks you outlined earlier. Nothing untoward happened, other than the dummy arguments for the entry not being visible in the debugger. As I mentioned earlier, we know about that. I dragged the name into the Watch pane. It told me the variable was undefined (same issue), but otherwise everything behaved fine. The debugger was still responsive and Windows was behaving normally.

I maintain that the debugger did not crash nor corrupt your Windows system - at least not with the software Intel and Microsoft provides. The more likely explanation is that you had some existing corruption that needed a triggering event. Disk corruption is likely.

Steve - Intel Developer Support

Backup:

I have two computers configured the same, if I lose one I move to the other while I fix the first one. Expensive, but a life saver.

JMN

Does the debugger use a r0-driver for user-mode apps debugging at all?

Not as far as I know. It is user-mode only.

Steve - Intel Developer Support

Does the debugger use a r0-driver for user-mode apps debugging at all?

Every thread partly executes in kernel mode when it calls Native API functions(implemented in ntoskernel).So it is very possible that some of the function's code will misbehave while executing in kernel mode and will bring down the whole system.

Of course, but in this case the reason of system fault and BSOD will be not the app's code but OS's, won't it? User-mode app just can't cause BSOD by itself. So, If the debugger is undoubtedly r3 only, there is nothing to debate.

Of course, but in this case the reason of system fault and BSOD will be not the app's code but OS's, won't it? User-mode app just can't cause BSOD by itself. So, If the debugger is undoubtedly r3 only, there is nothing to debate.

Did I write that reason of BSOD was app's fault?Misbehaved application will be terminated, but misbehaved kernel mode code even OS modules will BSOD the system.Every user mode thread has two stacks one for user-mode functions calls and one for kernel mode functions call when you debugg an app you can see the stub KiFastSystemCallEntry this iskernel mode entry point from the user mode part of thread.
Without crash dump file there is nothing to debate.

Did I write that I don't know basic windows architecture? Why are you describing it?
There is nothing to debate at all, because r3 debugger can't cause BSOD.

Did I write that user-mode debugger caused the BSOD? I have only described possible chain of events that might have caused the crash.

Crash is one thing. I can easily believe that some odd combination of WinAPI calls might trigger a problem there. But corruption of Windows that renders the system unbootable? Nope.

Steve - Intel Developer Support

Well I want to report thatVTune Amplifier caused damage to
my system as well. I was trying to debug a Fortran code in visual studio 2010
when the debugger crashed and keeps crashing since then. No matter what I tries
I always have an unstable debugging process. I tried to uninstall everything
and re-install it but never worked! I am very frustrated with the
damageVTune Amplifier has caused to my system. I got rid of VTune
Amplifier(luckily I did not buy it just had it in trail mode). However,
since then I was not been able to use vs2010 in debug mode and I lost three
days trying to work something out without any success.

I would never recommend this
harmful software"VTune Amplifier" to be used by anybody!

Was it system wide fault like BSOD or local process crash.

It is a local process. I am not able to debug with vs2010 no more.

Do not debug with VS 2010 it is not suitable for this.You must use windows debugger and collect local crash dump file.Install windbag and set it as a post-mortem debugger you can also tweak the settings in gflags.exe or even in debugger itself for what kind of exception debugger will break-in.
Do you have windows debugging tools installed?
It seems like VTune Amplifier caused some kind of unhandled exception probably Access Violation and was terminated.

Leave a Comment

Please sign in to add a comment. Not a member? Join today