Stack corruption debugging tips

Stack corruption debugging tips

I need help debugging stack/heap corruption problems. I have the following code

real(kind=8)    :: Offset(1,3), POffset(1,3), A(3,4), B(3,4)

...

do ii = 1,3
    Offset(1,ii) = (sum(A(ii,:))/8.d0 - sum(B(ii,:)))/8.d0
enddo
write(*,*)'A',offset
do ii = 1,3
    Offset(1,ii) = (sum(A(ii,:))-sum(B(ii,:)))/8.d0
    POffset(1,ii) = (sum(A(ii,:))+sum(B(ii,:)))/8.d0
enddo
write(*,*)'B',offset

There is a lot of other stuff happening in the subroutine before that block of code i've written here but that is where the bug I have is currently manifesting itself as a mismatch in the two write statements. Strange things happen of course: If I add a third write statement within the second loop then the first two write statements will match. If i delete the underlined line of code then the write statements match. Could someone point me in the right direction to debug this kind of error (online tutorial/guide etc)? It's pretty clear that the problem is not in those lines of code. So somewhere above the stack must have been corrupted? I am at a loss as to how to debug this. I've been programming Fortran for years and, embarrassingly I still don't know how to debug this kind of bug.  I'm using the compiler on a 32 bit linux machine.

Nate

publicaciones de 6 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.

so i typed it in the forum slightly wrong. my problem is with the following code...

do ii = 1,3
    Offset(1,ii) = (sum(A(ii,:)) - sum(B(ii,:)))/8.d0
enddo
write(*,*)'A',offset
do ii = 1,3
    Offset(1,ii) = (sum(A(ii,:))-sum(B(ii,:)))/8.d0
    POffset(1,ii) = (sum(A(ii,:))+sum(B(ii,:)))/8.d0
enddo
write(*,*)'B',offset

It is likely that the problem happens elsewhere in your code. Only a complete reproducible test case would be helpful here.

What do you mean by "a mismatch"? What makes you think it is stack corruption? There is a run-time stack check option that might help if that is the case.

Steve - Intel Developer Support

Are Offset and/or POffset DUMMY or pointer?

Jim Dempsey

www.quickthreadprogramming.com

Hi Jim and Steve, thank you for replying

Jim: No, none of the variables Offset, POffset, A or B are dummy variables or pointers.

Steve: By "mismatch" I mean the first write statement produces:

A 1.00000 -4.00000 3.00000

and the second produces

B 0.0000 0.00000 3.00000

Those aren't the exact results, (I don't have them in front of me now) but that is the pattern: the two lines don't produce the same numbers. For some reason the first and second entries of the array are getting set to zero. It seems to me the only possibility is stack corruption earlier in the routine.

I haven't been able to reduce the routine yet to something that is tractable in size, and I didn't want to burden anyone with trying to sort through and compile my code, I was just hoping that somewhere someone had assembled a nice guide to debugging stack/heap corruption. I think in my case it is heap corruption because I have a lot of allocatable arrays and type definitions in modules and I use pointers a lot.

One final note, I don't get the problem when I compile on windows (I use Microsoft Visual Studio) and I'm pretty new to compiling on Linux and am not very familiar with gdb.

More likely you have either an uninitialized variable, an out of bounds access or a type mismatch. If you are on Linux, the chance of stack corruption is lower than on Windows (with the STDCALL mechanism there.)

Since you have both good and bad versions, what I usually do is either run them in the debugger and stop periodically to see where results diverge, or add "instrumentation" (print statements) to write out intermediate calculations and see what is different.

I am going to move this thread to our Linux Fortran forum.

Steve - Intel Developer Support

Deje un comentario

Por favor inicie sesión para agregar un comentario. ¿No es socio? Únase ya