compiling ifort with -openmp flag

compiling ifort with -openmp flag

In Mac OS 10.6, I have a program that compiles and runs fine without the -openmp flag. As soon as I introduce that flag, I can compile fine, but get an immediate crash on the first line of the code with the error "illegal instruction". It doesnt matter whether or not I have any openmp directives in the code, I get the same result.
I've also run a couple of simple examples from the Openmp book and these compile and run fine. Any ideas?

25 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Does the same thing happen if you use -auto instead of -openmp?

Steve - Intel Developer Support

In my version of ifort, I see no -auto option, other than -auto-scalar. Do you mean -parallel? It runs fine with the -parallel option (though the speed up is perhaps only %20 or so).

No, I don't mean -parallel. -auto is a valid option. If you find, somehow, that doesn't work, try -recursive.

Steve - Intel Developer Support

I tried both, separately, with the same previous result. While the compiler didnt complain, there is no -auto flag in the man pages for Mac OS X ifort.

Any other ideas? By the way, thanks for taking the time!

-Lee

Here is the Makefile, if that is of help. Also, in the intel debugger, I get a failure on the first line of the program
"PROGRAM MAIN", with the descriptor "SIGBUS" error.

#
# Makefile for the 3D Shear Velocity inversion
#

FC = ifort
FFLAGS = -parallel -g -openmp -recursive -debug extended -O0 -m64 -openmp-report=1 -align all
EXTND=-extend_source 132

#FC = gfortran
#EXTND=-ffixed-line-length-none

CC = gcc
CFLAGS = -O -m64

#
# You need a sacio lib to link this code.
#

SACLIB = /usr/local/bin/sac/lib/libsacio.a

# Need dynamic libraries, but already set in shell
# DYLD_LIBRARY_PATH = /Developer/opt/intel/composer_xe_2011_sp1.7.246/compiler/lib

#
# Executable will be placed here.
#

BINDIR=.

OBJS = lovdsp.o lovmrx.o raydsp.o raymrx.o partials.o gett.o \
lprModel.o rayrkg.o lovrkg.o drgSubroutines.o \
buildGravitynew.o transform_r_gfortran.o filterGravity.o
#
# Small problems can be handled by an SVD approach
#

SVDOBJS = svdrs.o h12.o

#
# Larger problems require a conjugate gradiewnt-based approach.
#

LSQROBJS = lsqr.o lsqrblas.o aprod.o row_io.o lsqrUtils.o

#
# dr indicates dispersion & rftn inversion
#

dr_3d_inv: $(OBJS) $(SVDOBJS) $(LSQROBJS) model3d.inc dr_3d_inv.o
$(FC) $(FFLAGS) $(OBJS) $(SVDOBJS) $(LSQROBJS) -o $(BINDIR)/dr_3d_inv dr_3d_inv.o $(SACLIB) $(EXTND)

#
clean:
# rm *.o

#
# dependencies
#

dr_3d_inv.o: dr_3d_inv.f model3d.inc
$(FC) $(FFLAGS) $(EXTND) $(OBJS) $(SVDOBJS) $(LSQROBJS) model3d.inc -c dr_3d_inv.f $(SACLIB)
#
lsqrUtils.o: lsqrUtils.f
$(FC) $(FFLAGS) $(EXTND) -c lsqrUtils.f
#
drgSubroutines.o:drgSubroutines.f
$(FC) $(FFLAGS) $(EXTND) -c drgSubroutines.f
#
buildGravitynew.o: buildGravitynew.f model3d.inc
$(FC) $(FFLAGS) $(EXTND) -c buildGravitynew.f
#
filterGravity.o: filterGravity.f model3d.inc
$(FC) $(FFLAGS) $(EXTND) -c filterGravity.f
#
transform_r_gfortran.o:transform_r_gfortran.c
#
svdrs.o: svdrs.f
h12.o: h12.f
lovdsp.o: lovdsp.f
lovmrx.o: lovmrx.f
raydsp.o: raydsp.f
raymrx.o: raymrx.f
partials.o: partials.f
gett.o: gett.f
sac2xy.o: sac2xy.f
srf2sac.o: srf2sac.f
lovrkg.o: lovrkg.f
rayrkg.o: rayrkg.f
lsqr.o: lsqr.f
lsqrblas.o:lsqrblas.f
aprod.o: aprod.c
row_io.o: row_io.c
#

I wanted to see if -auto caused the same problem as -openmp. Do I understand that it does? If so, then what is likely happening is that you are overflowing the stack. -openmp implies -auto (-recursive is an alias) - both are in the documentation (not sure about man pages, though - I don't think the man page is comprehensive.) This puts all local variables on the stack. OpenMP complicates the issue by having thread-specific stacks.

The first thing I suggest you try is to unlimit the stack. The syntax to do this varies by shell, it can be:

ulimit stack

or

limit stacksize unlimited

This doesn't really make it "unlimited", but raises the size to the max configured for the kernel. You may then need to try setting the environment variable KMP_STACKSIZE to a larger value - try "16M" (without the quotes).

Steve - Intel Developer Support

Is this possibly a stack size issue? I've set OMP_STACKSIZE to sizes up to 1G with no luck. I've also used the csh built-in command ulimit to set it to the hard amount, which I think is 65M (up from 8M). Or are those values given in K?

Any help ideas welcome....

Steve, do I need to compile with the stack increased or just run in a shell with it increased? Or both?

OMP_STACKSIZE defaults to 4MB (for 64-bit builds). The largest I have seen used is 32MB. This amount, times number of threads, comes out of your total stack, as seen by ulimit.
The build doesn't remember anything about what stack you had set at build time, and there's no stack size built in such as you would set when using Microsoft linker.

Somehow I didnt get yesterday's reply until today. Oh well. So, I've set OMP_STACKSIZE and KMP_STACKSIZE to 16M and used limit stacksize unlimited in both the run and compile shells. Still fails with same error. I checked and using -automatic gives me "option deprecated". I think it has been replaced with -parallel.

the code runs with -parallel, but fails with both -automatic and -openmp, with or without the stacksize settings.

It is worth pointing out that the code fails regardless of whether or not I've included any openmp directives in the actual program. It is just the addition of "-openmp" to the compilation line that introduces the errors.

When I set OMP_NUM_THREADS to 1,2 or 8, (with the stack sizes enlarged as before) I get the same error.

-auto has nothing to do with -parallel. I didn't say -automatic.

Steve - Intel Developer Support

It doesnt work with -auto either....

When I look in the Intel Software Development Tools/Intel FOrtran Compiler User and Reference Guides the option "auto" refers one to the option "automatic". Document number 304970-006US. Maybe its an old version, who knows...

By the way, I should have mentioned this earlier, I'm using XE 12.1.

Documentation for -auto

If -auto behaves the same way as -openmp, then that confirms my theory that stack space is the issue.

What happens if you add -heap-arrays ?

Steve - Intel Developer Support

Yeah, it still fails in the same way. Thanks for the doc info. I'm slowly getting better at using the documentation...

Here is a summary of what doesnt work. I've set the stacksize to unlimited (65336) and can check to be sure
that is the case. I've set KMP_STACKSIZE to 16M. I've tried compiling with -auto, -openmp, -recursive, -heap-arrays. So far, none of this has made a difference. I still get an error on the first line when I run the code. In the
debugger (IDB), I get a SIGBUS error. Outside of the debugger, I get "illegal instruction". Anybody have any further ideas? Thanks for your contributions so far!

-Lee

>>I get a failure on the first line of the program "PROGRAM MAIN", with the descriptor "SIGBUS" error.
...
#
transform_r_gfortran.o:transform_r_gfortran.c
#
...
lsqrblas.o:lsqrblas.f
aprod.o: aprod.c
...
<<

You apparently have a mixed language program. Nothing inherently wrong with that, however if your ".c" programs are C++ they then may have a ctor being executed for a static object which runs prior to the entry to "PROGRAM MAIN". These ctor's (if present) may require some initialization which has not yet been done.

Also, there may (or may not) be an issue of mixing OpenMP in FORTRAN and C/CPP with/without a dependancy of if "main" is in C/CPP or in FORTRAN "PROGRAM ...".

Jim Dempsey

www.quickthreadprogramming.com

Thanks for the ideas Jim. I dont have any C++ and there is no "main" in the C. I've tried introducing "main" into
the fortran calling program but with no change in run behavior. I've also attempted all the fixes suggested by Mr. Green in his 06/12/09 post (-traceback; -check [with numerous options]; and others)(article titled Determining root cause of segmentation faults...). All with the same result. In gdb, I get this error
message: "EXC_BAD_ACCESS; could not access memory. Reason - KERN_INVALID_ADDRESS at ....". The debugger points to the first line of the code.

I do link an external library that could have "main" issues, and I could probably rewrite that code and compile it myself if push comes to shove.

The bottom line is: it runs fine with just the ifort -parallel flag (or no parallelization). How much improvement over the -parallel flag could I expect using OpenMP if I'm an okay programmer, but certainly nothing special?

Evidently, if your application isn't worth the investment of the effort on your part to get it running correctly, no one in their right mind would bet on the performance questions you posed.
If you call a library of unknown quality in a parallel region, that's certainly a possible source of problems.
Generally speaking, in the rare case where all your important code is parallelized by -parallel, that could give similar performance to openmp. As you've already seen, OpenMP may be more sensitive to correctness issues.

Hello,which compiler version do you use and what is Mac OS version, is it 10.6.8 or less, 32 bit or 64 bit hardware?thanks.--Vladimir

Vladimir, I'm running Mac OS 10.6.8, I've got 64bit architecture, and am using Ifort XE 12.1. With no openmp directives included, but -openmp set in the compilation, the code crashes immediately on running, at the first line (confirmed with write statements). I get no files or info from -traceback or -check all, no errors or warnings on compilation. When it crashes it says "core dumped" but no file gets written. The lack of a core file is not due to corefile size limitations, that is set to unlimited. I'm looking into that issue today. Oh, and the problem isnt linking to the external library I mentioned in my last post. I've removed calls to that library and didnt link to it, and got the same problem.

thanks for any ideas!

-Lee

Leave a Comment

Please sign in to add a comment. Not a member? Join today