Previously tested F90 program not working using IFC

Previously tested F90 program not working using IFC

thegrayman's picture

Thursday, October 11, 2007 3:46:39 AM (Eastern Time)


Hello,


My name is Felipe, and I have a FORTRAN program that consists of several subroutines and a main.f file. This program has an MPI capability. . The program can successfully be compiled and linked with absoft fortran 90 compiler in unix system. This is done with a makefile that looks like This:



=======================


objects = main.o ---------"omtted names of object files-------
-----------------------
--------------------- initial.o


#
main.out: $(objects)
#
mpif77 -O2 -o main.out $(objects)


.f.o:
#
mpif77 -O2 -c $<



clean:
rm *~ *.o


==============


Please note that I have remove all the names of the object files for the sake of simplicity, but I kept main.out and the last object file. To run the program, I could just type 'main.out' without invoking mpirun command, because in my case the input data files do not require parallel processing.


The executble main.out file works just fine in absoft compiler that's installed in a DELL computerwith intel processor, and with the following operating system spec:



Red Hat Enterprise Linux WS release 4 (Nahant)
Kernel 2.6.9-5.EL on an i686


In the system above, the program works alright when Igivethe command'main.out' without any mpi related options, like 'mpirun -np # -filename main.out'


Now, I am trying to migrate this program to another unix envirorment that has intel fortran compiler 9.2 (or above), and I found that it is installed in this system:



x86_64-redhat-linux/3.4.6


Thetwostatements above are copy-paste from the unix screen. I have nomoredetails about these two systems. However, when I type 'rpm -qa|grep fortran' in the system that has absoft, it gives this:



absoft_32bitfortran95-9.0-1


Then, when I type the same command in the system that has intel fortran, it gives out this:



libgfortran-4.0.2-14.EL4
libgfortran-4.0.2-14.EL4


I just have a user account in these two unix systems. Is there any other unix command to find out more information about the system, like type ofunix software, cpu, etc? Maybe that would help in someway.


Anyway, I was told that I had to make some changes with the code to make it compatible with the Intelcompiler. So, I found that the logical constants .T. and .F. have to be replaced with .TRUE. and .FALSE. So, then, I run the same 'make' command to compile and link the files, and so I get main.out without any error messages. However. when&n
bsp; I run 'main.out', I would get this message from the Intel Fortrancompiler.



Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
Failing at addr:(nil)
[0] func:/opt/openmpi/lib/libopal.so.0 [0x2a95950a4a]
[1] func:/lib64/tls/libpthread.so.0 [0x3c9290c420]
[2] func:/opt/openmpi/lib/libopal.so.0(free+0x72) [0x2a959562b2]
[3] func:main.out(for_deallocate+0x5f) [0x506143]
[4] func:main.out(for_dealloc_allocatable+0x70) [0x5060ac]
[5] func:main.out(volume_+0x1932) [0x43cda2]
[6] func:main.out(MAIN__+0x30e2) [0x40c122]
[7] func:main.out(main+0x32) [0x40902a]
[8] func:/lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x3c9201c4bb]
[9] func:main.out [0x408f6a]
*** End of error message ***


Notice that there is aline with the word 'volume', which happens to be the name of a subroutine. So I looked at it, and I found that, generally, a local allocatable array variable Ais defined with one dimension, and subsequently allocated with a single scalar integer variable called 'dim2', which has been assigned with a number in the code. However, this variable is called in loop which runs from i =0 to 'dim2-1', which makes dim2 iterations, which corresponds to the number of elements of the array A. So, what I did is to allocate the array variable as A( 0 : dim2). So, then, I run the' make' command again with the intel compiler, and run 'main.out', and so, I get another error message, butwithout the volume word. as shown below.



Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
Failing at addr:(nil)
[0] func:/opt/openmpi/lib/libopal.so.0 [0x2a95950a4a]
[1] func:/lib64/tls/libpthread.so.0 [0x382420c420]
[2] func:[0x44293c]
[3] func:[0x449b75]
[4] func:[0x4561ba]
[5] func:[0x452eb9]
[6] func:[0x40d777]
*** End of error message ***



Is there a way to fix this problem. I figure that the main issue is that this program works well with absoft compiler, but it does not work with Intel compiler.What are the fortran language compatility issuesbetween these two compilers? Also, what does the above error message mean? Is there any debugguing techinque that would help me fix this problem? If so, please provide me with unix options that I could use. In particular,are there anyoptions that I couldput in'makefile'todebuggthe program? I was thinking that there could be an option to configure the compiler to make the executable file work alright.


THANK YOU.


17 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
thegrayman's picture

I was able to figure out something. My program was actually compiled with an absoft F90 compiler in a i-32based system. As shown in my original post, the makefile that I used has some compiler options that seem to work with this system. My program works fine in this system.


Now, I am trying to migrate my f90 program to an i-64 based system using Intel fortran compiler 9.2.My program can be compiled, but itcannot run in this system. Are there any compileroptionsthat I could add up in the 'makefile' file ? I found a government link that has a list compiler options, and I am going to through this to see if there is anything I could apply.


http://www.llnl.gov/computing/tutorials/linux_clusters/man/ifort.txt


I figure that my program was written in f90 language that had some rules about array variables that does not apply to i-64 based system, but to i-32 based systems.


Please help, thank you.

Steve Lionel (Intel)'s picture

As you have not shown us what problems you're having, or any source code, it's very difficult to help you. For your first post, I'll comment that .T. and .F. are non-standard (and not a popular extension.) That you were able to build and run with compiler X does not guarantee that the program is correct.

In general, no source changes are needed to build a Fortran program on a 64-bit system. Unlike C, you have to go out of your way, using extensions, to expose the difference in address size in Fortran.

If you'll tell us about the specific problems you are encountering, we may be able to help. Trying to select options from a list someone else created is generally not worthwhile.

The SEGV fault most likely indicates a programming error. It appears to be happening during a DEALLOCATE and this suggests to me that the program is overwriting memory improperly. This could be due to an array bounds error, argument mismatch or something else. Try adding the -CB option and see if that tells you anything different when you run it.

Steve
thegrayman's picture

I guess my question is that if thereare anycompatibity issue between using Absoft@ Fortran compiler 9.0-1 on a i32 based system and a Intel@ Fortran compiler 9.2 on i-64 based system, because I have this Fortran90 program that works well when I compiled itwith the Absoft compiler, butnot with Intel compiler.


You mentioned that the error message that I get from Intel compiler could be due to some array bounds error in my program. However, I do not getthis error message from the Absoft compiler that I have been using to run the program. This leads me towonder if there is a particular compiler option that prevents this kind of problem between different compilers, as far as array bounds are concerned.


Lastly, could you explain how to use -CB option in the makefile that I showed in my original thread, or which I am again showing below:


=======================


objects = main.o ---------"omtted names of object files-------
-----------------------
--------------------- initial.o


#
main.out: $(objects)
#
mpif77 -O2 -o main.out $(objects)


.f.o:
#
mpif77 -O2 -c $<



clean:
rm *~ *.o


==============


In which of the lines can I enter -CB ?


Please help.

jimdempseyatthecove's picture

thegrayman,


It is unwise to assume "because of the fact a program runs seamingly without error" that this program is without programming errors.


From an earlier post from you stated


>>Ais defined with one dimension, and subsequently allocated with a single scalar integer variable called 'dim2', which has been assigned with a number in the code. However, this variable is called in loop which runs from i =0 to 'dim2-1', which makes dim2 iterations, which corresponds to the number of elements of the array A. So, what I did is to allocate the array variable as A( 0 : dim2). <<


The allocation of A is incorrect. The correct allocation is A(0:dim2-1).


You might assume that ifthis array is one cell larger than it need be that this should not introduce a subsiquent problem into the program.


Consider the effect if somewhere in the program "size(A)" is used to determine the number of elements in A. The result "dim2+1" is returned. Now you may have a situation where a different part of your program is potentially running off the end of some other array (or off the end of A).


Jim Dempsey


www.quickthreadprogramming.com
thegrayman's picture

Hi,



You are right about the allocation mistake of the array. However, Iwas making thischange myself after getting this error message from the Intel compiler;


Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
Failing at addr:(nil)
[0] func:/opt/openmpi/lib/libopal.so.0 [0x2a95950a4a]
[1] func:/lib64/tls/libpthread.so.0 [0x3c9290c420]
[2] func:/opt/openmpi/lib/libopal.so.0(free+0x72) [0x2a959562b2]
[3] func:main.out(for_deallocate+0x5f) [0x506143]
[4] func:main.out(for_dealloc_allocatable+0x70) [0x5060ac]
[5] func:main.out(volume_+0x1932) [0x43cda2]
[6] func:main.out(MAIN__+0x30e2) [0x40c122]
[7] func:main.out(main+0x32) [0x40902a]
[8] func:/lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x3c9201c4bb]
[9] func:main.out [0x408f6a]
*** End of error message ***


Originally, array variable A is declared with one dimension of size A(dim2). Then, this array goes througha loopusing index from 0 to dim2 -1, which equals the number of elements of A, or dim2. The change that I made was actually "unwise". However, the original code works just fine with Absoft fortran compiler, but it does not work with Intel compiler.


I noticed that if A is assigned with dim2 elements,or A(dim2), then its index should starts from 1 to dim2. However, in the original code, this variable goes through loop with index from 0 to dim2-1. This seems to be no problem with Absoft compiler, but apperantly it causes someproblems with Intel compiler. Is there any Intel compiler option that could allow arrays to be called with indexvaluesthat arenot alignedwith index 1 to dim2 ?


PLEASE HELP

Steve Lionel (Intel)'s picture

Your program is incorrect. That you did not notice an error with the Absoft compiler is simply luck. Most likely, you overwrote some other variable that you did not notice. There is no compiler option to make arrays zero-based; you must fix your code.

Steve
thegrayman's picture

Hello,


I have tried to used -CB to conduct debugg on my program, and it found nothing wrong. Basically, I used -CB as shown below in makefile.


=======================


objects = main.o ---------"omtted names of object files-------
-----------------------
--------------------- initial.o


#
main.out: $(objects)
#
mpif77 -O2 -o -CBmain.out $(objects)


.f.o:
#
mpif77 -O2 -c $<



clean:
rm *~ *.o


==============


Is this right? Isthere also another method I could used to see if something is wrong. Particulary, what other compiler options could used in the makefile that shown above in order to find mistakes in arrays bounds subscript ?


PLEASE HELP

Steve Lionel (Intel)'s picture

Did you run the program built with -CB?

Another option you can try is:

-diag-enable-sv:2

This will give you "link"-time diagnostics for the whole program. be sure to specify this consistently on both the compile and link rules.

Steve
Steve Lionel (Intel)'s picture

Speaking of consistency, you added -CB to the rule that links the executable, which will do nothing. You need to add it to the compile rule instead.

Note that -diag-enable-sv will not give you an executable program, but it should, if it works, give you a lot of diagnostics about possible problems with the program.

Steve
thegrayman's picture

Sorry, but both options can not be recognized by intel fortran compiler. Below is what I change in makefile to degug:



#
main.out: $(objects)
#
mpif77 -O2 -o -diag -enable -sv main.out $(objects)


.f.o:
#
mpif77 -O2 -c -diag -enable -sv $<



clean:
rm *~ *.o

Then,when I press ENTERafter 'make', the compiler starts to compile, but it ignores the additional options. Below is the first five lines run-time compilation.


mpif77 -O2 -c -diag -enable -sv main.f
ifort: Command line warning: ignoring option '-d'; argument is of wrong type
ifort: Command line warning: ignoring unknown option '-enable'
ifort: Command line warning: ignoring unknown option '-sv'
mpif77 -O2 -c -diag -enable -sv invert.f


Maybe these options are misplaced or something. Please shown me whereto put these options, so thatI could try it again. It's hard to start fixing a program of 68 subprograms without knowing where to start off.


thanks.

Steve Lionel (Intel)'s picture

You split the single option into three. It is one continuous string, not three separate options.

-diag-enable-sv:2

Also, you must be using ifort 10 to use this.

Steve
thegrayman's picture

Following from the last reply,Iam using intel fortran 9, andalso, Idid enter -diag-enable-sv:2 as a single character, but the compiler does not recognize.Maybe IF 9 does not supports it.



Also, I have tried to use -cb again, and it works now. And I get file name '-cb'


I tried to open this file with vi editor, butI get this:


% vi -cb
% VIM - Vi IMproved 6.3 (2004 June 7, compiled Aug 10 2005 18:49:40)
% Garbage after option: "-cb"
% More info with: "vim -h"


When executing 'make', it shows ''multiple definition of `name of subroutine'..." many times during run-time. What does this mean, because that's no error for me. It does not tell me anything like what line of subroutine to look at for error.


The last statement at the end of compilationis this:


: multiple definition of `mutk_'
main.out(.text+0xd903c): first defined here
/usr/lib/gcc/x86_64-redhat-linux/3.4.6/../../../../lib64/crt1.o(.dynamic+0x0): multiple definition of `_DYNAMIC'
ifort: error: Fatal error in ld, terminated by segmentation violation


Thanks


Steve Lionel (Intel)'s picture

As I wrote earlier, -diag-enable-sv requires version 10.

The option is spelled -CB, not -cb. Case matters.

Steve
Lorri Menard (Intel)'s picture

Hi -


You put the "-CB" in the wrong place. This is from your post about your makefile:


mpif77 -O2 -o -CBmain.out $(objects)


There are options that take arguments, and "-o" is one of them. You replaced the original argument (main.out) with -CB. As a note, -o names the output file, and that is why your file was named "-CB".


Try this:


mpif77 -O0 -CB-omain.out $(objects)


Please note I also set the optimization level to 0 (by changing -O2 to -O0). You can experiment with that one too.


- Lorri

wim van hoydonck's picture

I just checked this with ifort v 10.0.026, and I think that the option consists of two parts, not one as you say:

-diag-enable-sv2 does not work
-diag-enable sv2 does work.

The man page also says something like this.

Steve Lionel (Intel)'s picture

Oops, yes, you are right.

Steve

Login to leave a comment.