One of the various responsibilities I have is for the compiler samples (both Fortran and C++). For Intel Visual Fortran, we have a lot of samples - for the other compilers, fewer. The Windows Fortran samples are a mixed lot; some came to us from Microsoft Fortran Powerstation (with or without extensive modification) and some were developed by us (mostly, yours truly.)
For version 11.0, I took one of the old Microsoft samples, a Win32 program called Angle, and tried to clean it up. The original coding style had used a series of external procedures and multiple copies of INTERFACE blocks to declare other routines. As I've written elsewhere, I consider this poor Fortran style - if you think you need to write an INTERFACE block for a Fortran routine, you're doing it wrong - so I rewrote this as a single WinMain entry function and a series of contained routines. This means that all the interfaces were explicit without the need for INTERFACE - great! It built fine on all platforms and ran ok, so it seemed, at least on IA-32, so I checked it in.
Since it is a "Windowing application", it needs to "register" with Windows the addresses of various routines that handle dialogs and Windows "messages", for example, "create a window", "repaint window", etc. This is done by passing LOC(routine-name) to the API routine, or in some cases, assigning that value to a member of a data structure. Here's where I ran into trouble.
In Fortran 2003, it's not allowed to pass an internal procedure as an actual argument as the simple method of doing this, just passing the address of the routine's entry point, doesn't provide the context necessary for up-level references to the caller's variables. The standard does say, however, that if an implementation supports this, it must do it so that the internal procedure uses the dynamic context of the "invocation" that passed the routine as an argument. Here's an example:
localvar = 1
call extsub(intsub) ! Extension!
localvar = 2
call extsub(intsub) ! Extension!
print *, localvar
end subroutine intsub
end subroutine callit
subroutine extsub (subname)
end subroutine extsub
Intel Fortran (and DEC/Compaq Fortran before that) supports this as an extension, and this feature is part of the upcoming Fortran 2008 standard, so I felt it was ok to use the feature. Or at least I thought I was using it...
Now, you may be asking, how does this work? How can a single address contain both the actual entry point and the "dynamic context" needed to get at the specific invocation's variables. The answer is what is popularly called a "thunk".
The actual implementation of thunks varies - what I will describe here is a generic method that may or may not match what actually happens. The concept is the same, though.
First, the compiler has to know that it needs to set things up so that some routine called later can access its local variables. It arranges these variables in a section of stack (usually) and generates the address to the start of the section. Let's call this address SF (for Stack Frame). It then constructs a mini-procedure that does something like this (pseudocode ahead):
and then saves the address of this mini-procedure, the "thunk". It is this address that is passed as an actual argument, so when the routine is called through the reference, the stack frame context is loaded into a register and then the real routine executes, knowing that the context has been set up. (If it were being called directly, the caller would establish the context first.) The internal routine then uses the context to find the local variables of the "invocation" that created the thunk. Neat.
Historical note: in the past, these thunks were built on the stack, which was very convenient. However the notion of executable code on the stack was also convenient for virus writers so operating systems evolved to disallow this. Other solutions were found, a topic for someone else's blog.
Ok, back to the story. During the release process for 11.0, one of our team of crack Product Validation engineers discovered that the Angle program didn't work on the Intel® 64 platform. When run, the program just exited immediately, never producing any error message or display. I was chagrined, as I should have tested it on all three platforms. I spent quite a while trying to figure out what was wrong, getting nowhere. I put it aside. Luckily, a few months later, another of our engineers was playing with this and also noticed the problem. Even better, he was able to figure out what was going wrong, which enabled me to fix the problem.
Remember when I said that I "thought" I was using the extension allowing passing of internal procedures? Well, that wasn't really what I did - I used LOC(procedure), which is not the same thing! This simply provides the address of the actual routine's entry point, it does not create a thunk! So, if the internal procedure ever tried to reference an up-level local variable, it would be doing so through an uninitialized context pointer and trashing memory!
Ok, but were there any up-level references? Yes, as it turned out, though not deliberately. The unnamed Microsoft coder had been sloppy and not explicitly declared all local variables: in particular, variables named ret and lret used to hold return statuses. Of course, IMPLICIT NONE wasn't being used either. No real harm done, in the original code, since these values never got used. But when I moved these routines into internal procedures, it turned out that variables of these names were declared - in the host scope! Even the IMPLICIT NONE I added didn't help here - those assignments became up-level references automatically.
The fix was to add local declarations of the appropriate variables, eliminating all up-level references. I also asked that the compiler complain if a LOC was done of an internal procedure that contained up-level references - this should appear in a future update.
But wait, there's more to this story. What if LOC had created a thunk rather than just a routine address? There's no reason why not, right? True. However, even if it had, I would have run into another problem which can affect any call where information about arguments is saved for later.
The structure of this code had two worker routines, MainWndProc and DlgProc. Both of these are called asynchronously by Windows to handle various events. At one point in MainWndProc, it passes LOC(DlgProc) to a Win32 routine (CreateDialogParam) and then returns. What happens to the thunk when MainWndProc returns? It's gone since the invocation that created it has returned. The place in memory where it resided may or may not still contain valid contents. If the thunk is then called through, there can be unexpected behavior including access violations or data corruption. Not nice.
This type of error affects many different kinds of programming where side-effects happen sometime after a call completes. Fortran 2003 introduces (and Intel Fortran supports) the concept of asynchronous I/O, where you begin an I/O operation and then return to the program flow, expecting the variable to be filled in (or written out) later. It is important that the variable remain "in scope" until the I/O completes, otherwise nasty errors can occur. You can see similar effects with the popular MPI distributed processing library where you "send" and "receive" data asynchronously between independent executions of code. Again, you must make sure that the variables used asynchronously stay defined until all operations are complete.
Going back to Angle, what if I had wanted the address of a thunk rather than that of the actual routine? I could create a function to do that like this:
function thunkloc (proc)
integer(INT_PTR_KIND()) :: thunkloc
thunkloc = loc(proc)
end function thunkloc
Angle has been corrected for Update 2 of Intel Visual Fortran 11.1 and I have learned my lesson (I hope) about not assuming too much and being sure to test thoroughly!
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804