Can't get Gr0 from UnwindContext

Can't get Gr0 from UnwindContext

I'm getting the message in the title when I run my fortran90code (mpi code), I have almost no idea what this means except its something to do with memory (stack). The full message is:

getRegFromUnwindContext: Can't get Gr0 from UnwindContext, using 0
getRegFromUnwindContext: Can't get Gr0 from UnwindContext, using 0
getRegFromUnwindContext: Can't get Gr0 from UnwindContext, using 0
getRegFromUnwindContext: Can't get Gr0 from UnwindContext, using 0
MPI: MPI_COMM_WORLD rank 1 has terminated without calling MPI_Finalize()
MPI: aborting job

I'm using ifort 8.1 on anSGi Altix 3700machine with Itanium2 procs. This code has been run on SGi MIPS based machines and compiled with f90 without a problem.

I've tried using -cxxlib-gcc which hasn't helped. Anyone got any ideas?

18 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

-cxxlib-gcc is an option of icpc, not Fortran, and it is the default for icpc 8.1.
You have left out a great deal of information:
Are you compiling and running against the same MPI installation? It seems unlikely that you would use the same MPI on a MIPS machine as on Altix, unless you installed lam or mpich on both. If you did install one of those, and the test programs are working, check that you are setting your own program up the same way.
Out of date Altix installations which have ifc 7.1 dynamic libraries may be worked around by the ifort -i-static link option, provided that the static libraries which come with your version of the compiler are used. If your MPI was built locally for ifc 7.1, you would likely need a newer ifort 8.1 built copy to build your application correctly.

I got the exact same problem.
program runs fine on an xeon cluster with the intel 7.1 compiler and various variants of mpi ao the vmi variant

compiled my program on an sgi altix with the 8.1 compiler and got a segmentation fault and the above mentioned error message.
I compiled with -traceback -C -g -zero -c to be on the save side :)
the program crashes on
allocate (temp(0:nrows))
I checked if the array was already allocated but it is not.
My program is dynamically linked against

ldd mlmsp
libguide.so => /opt/intel/mkl701/lib/64/libguide.so (0x2000000000040000)
libpthread.so.0 => /lib/tls/libpthread.so.0 (0x20000000000c8000)
libc.so.6.1 => /lib/tls/libc.so.6.1 (0x20000000000f8000)
libm.so.6.1 => /lib/tls/libm.so.6.1 (0x2000000000364000)
libcxa.so.6 => /opt/intel_cc_80/lib/libcxa.so.6 (0x20000000003f8000)
libunwind.so.6 => /opt/intel_cc_80/lib/libunwind.so.6 (0x2000000000460000)
libdl.so.2 => /lib/libdl.so.2 (0x2000000000494000)
/lib/ld-linux-ia64.so.2 => /lib/ld-linux-ia64.so.2 (0x2000000000000000)

I use MPIVERSION="1.2.6" distributed by SGI
the allocate call is within a recursive subroutine.

cheers
Thomas

Any reason why you would run against the libguide.so which came with MKL, rather than the one which came with your ifort 8.1? You may have avoided the more typical Altix problem, of attempting to run a build which was made with the 8.1 compiler on an Altix system which is set up to support 7.1 compilers.

People sometimes go to the trouble of over-riding the libguide dynamic link scheme which SGI insisted upon, forcing a static link, to assure that the libguide is the one which comes with the compiler. In simple cases, this might be done with the ifort -i-static link option.

"The" 8.1 compiler is not a sufficient identification. Recent updates are much better than earlier ones.

oke first things first
Version 8.1 Build 20040922 Package ID: l_fc_pc_8.1.019

linking with -i-static did not work do not know why
ifort -i-static -o mlmsp main.o some other .o files
-L/opt/intel_fc_80/lib/ -lguide -L/usr/local/mpi_intel/lib -lmpich -lmpichf90 -L/opt/intel/mkl701/lib/64 -lmkl_lapack -L/opt/intel/mkl701/lib/64 -lmkl_ipf -lpthread -lc -lm

and then I get
ld: --relax and -r may not be used together

I got the right libguide by putting /opt/intel_fc_80/lib/ as the first entry in my LD_LIBRARY_PATH but the problem is still there
getRegFromUnwindContext: Can't get Gr0 from UnwindContext, using 0
forrtl: severe (174): SIGSEGV, segmentation fault occurred

ldd mlmsp
libguide.so => /opt/intel_fc_80/lib/libguide.so (0x2000000000040000)
libpthread.so.0 => /lib/tls/libpthread.so.0 (0x20000000000c8000)
libc.so.6.1 => /lib/tls/libc.so.6.1 (0x20000000000f8000)
libm.so.6.1 => /lib/tls/libm.so.6.1 (0x2000000000364000)
libcxa.so.6 => /opt/intel_fc_80/lib/libcxa.so.6 (0x20000000003f8000)
libunwind.so.6 => /opt/intel_fc_80/lib/libunwind.so.6 (0x2000000000464000)
libdl.so.2 => /lib/libdl.so.2 (0x2000000000498000)
/lib/ld-linux-ia64.so.2 => /lib/ld-linux-ia64.so.2 (0x2000000000000000)

cheers
Thomas

I ran my program with
-dbg=gdb -np 4

getRegFromUnwindContext: Can't get Gr0 from UnwindContext, using 0
getRegFromUnwindContext: Can't get Gr0 from UnwindContext, using 0

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 2305843009213885552 (LWP 11177)]
_int_malloc (av=0x200000000035f5e8, bytes=80) at malloc.c:4056
4056 malloc.c: No such file or directory.
in malloc.c
(gdb) forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
libc.so.6.1 20000000001D5721 Unknown Unknown Unknown
libc.so.6.1 20000000001D3390 Unknown Unknown Unknown
mlmsp 40000000001E1DE0 Unknown Unknown Unknown
mlmsp 40000000001E1F30 Unknown Unknown Unknown
mlmsp 40000000000726F0 Unknown Unknown Unknown
mlmsp 4000000000047AE0 innersol_f90_ 328 multilevel.f90
mlmsp 400000000005A380 solforward_f90_ 1008 multilevel.f90
mlmsp 400000000004F8C0 multilevelpsapply 657 multilevel.f90
mlmsp 4000000000052A80 multilevelpsapply 679 multilevel.f90
mlmsp 4000000000052A80 multilevelpsapply 679 multilevel.f90
mlmsp 4000000000077AE0 Unknown Unknown Unknown
mlmsp 40000000000053B0 Unknown Unknown Unknown
mlmsp 4000000000004610 Unknown Unknown Unknown
libc.so.6.1 2000000000122990 Unknown Unknown Unknown
mlmsp 4000000000004000 Unknown Unknown Unknown
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
libc.so.6.1 20000000001D53C0 Unknown Unknown Unknown
libc.so.6.1 20000000001D3390 Unknown Unknown Unknown
mlmsp 40000000001E1DE0 Unknown Unknown Unknown
mlmsp 40000000001E1F30 Unknown Unknown Unknown
mlmsp 4000000000059860 solforward_f90_ 1002 multilevel.f90
mlmsp 400000000004F8C0 multilevelpsapply 657 multilevel.f90
mlmsp 4000000000052A80 multilevelpsapply 679 multilevel.f90
mlmsp 4000000000052A80 multilevelpsapply 679 multilevel.f90
mlmsp 4000000000077AE0 Unknown Unknown Unknown
mlmsp 40000000000053B0 Unknown Unknown Unknown
mlmsp 4000000000004610 Unknown Unknown Unknown
libc.so.6.1 2000000000122990 Unknown Unknown Unknown
mlmsp 4000000000004000 Unknown Unknown Unknown
Quit
(gdb) where
#0 _int_malloc (av=0x200000000035f5e8, bytes=80) at malloc.c:4056
#1 0x20000000001d3390 in __libc_malloc (bytes=1528) at malloc.c:3295
#2 0x40000000001e1de0 in for_allocate ()
#3 0x40000000001e1f30 in for_alloc_allocatable ()
#4 0x400000000004c150 in multilevelpsapply_f90 (ps_base=0x6000000002c9a500,
ntype=2, b={1.491668147410681e-154}, x={2.6815616147071866e+154},
nrows=190, splitlevel=7, id=0) at multilevel.f90:635
#5
0x4000000000052a80 in multilevelpsapply_f90 (ps_base=0x6000000002c9a500,
ntype=2, b={66934.575417086817}, x={67405.233804015545}, nrows=568,
splitlevel=7, id=0) at multilevel.f90:679
#6 0x4000000000052a80 in multilevelpsapply_f90 (ps_base=0x6000000002c9a500,
ntype=2, b={0.0067068525168363655}, x={66308.28309822407}, nrows=1702,
splitlevel=7, id=0) at multilevel.f90:679
#7 0x4000000000077ae0 in fgmres_parasails_f90_ ()
#8 0x40000000000053b0 in MAIN__ ()
#9 0x4000000000004610 in main ()

I do need some help with this because I have no idea where to start looking for a solution. is this compiler related or has it something to do with the mpi implementation on sgi? what does this Can't get Gr0 from UnwindContext, using 0 mean?

cheers
Thomas

I would suggest submitting a test case to Intel Premier Support.

Steve - Intel Developer Support

Hi,

I too have started to see this error on an Altix using the v8.1 compiler. Was there ever a resolution to this problem by submitting it to the support line?

The code in question is written in Fortran and uses SGI's MPI. I have not seen this error before we upgraded to Propack 3 which uses more updated versions of glibc.

I had also seen it in a serial code but sadly I fixed the problem and did not keep a copy of the broken code.

Thanks,

Kevin.

nope i did not fix it yet. And since I have no idea what causes this problem i cannot make a simple testcase.
what version of ifort and mpich are you using? do you have a working mpirun_dbg.idb on your system? maybe that will give us some more clues? mpirun -dbg=idb -np 2 yourprogram
does this also occur in a recursive routine? dou you use a lot of pointers? on what system and with what compiler mpich combination does your program run without problems?

maybe if we give the intel guys these kind of input they can provide us with some answhers.

cheers
Thomas Geenen

The Gr0 errors are caused because there is a problem in the code which causes it to crash, it then uses libunwind to figure out where in the code it crashed libunwind also fails which means that you cannot attach a debugger.

I've now managed to create a simple test case which shows this problem. I'll submit it using Premier Support.

could you attach the example problem? does this program also crash on other systems with other compilers?

thanks
Thomas

I don't really have access to other systems, in fact I only have the 8.1.023 and 8.1.025 compilers. If you have any others it'd be useful to see the results.

The following code runs until you run out of quota (so make sure you have one otherwise it'll fill up the entire system!).

PROGRAM mymain
INTEGER :: i
INTEGER, PARAMETER :: n=(128*1024*1024)/4
INTEGER :: a(8192)
DO
WRITE (50) a
END DO
END PROGRAM mymain

ran my program with idb still got the same crash but without the Can't get Gr0 from UnwindContext.
so as i understand it this call has nothing to do with the crash itself but with the crashandling? is that right?

cheers
Thomas
got a little closer to why my program crashes
(idb) down
>0 0x20000000001d57f0 in _int_malloc(av=0x200000000035f208, bytes=512) "malloc.c":4056
Source file not found or not readable, tried...
./malloc.c
/.automount/homeserv/home/students/geenen/mlmsp/malloc.c
(Cannot find source file malloc.c)
(idb) print av
0x200000000035f208
(idb) print *av
struct malloc_state {
mutex = 3532840;
stat_lock_direct = 2305843009217226792;
stat_lock_loop = 0;
stat_lock_wait = 0;
pad0_ = [0] = 0;
max_fast = 0;
fastbins = [0] = 0x0,[1] = 0x0,[2] = 0x0,[3] = 0x0,[4] = 0x0,[5] = 0x0,[6] = 0x0,[7] = 0x0,[8] = 0x0,[9] = 0x0,[10] = 0x0;
top = 0x0;
last_remainder = 0x0;
bins = [0] = 0x0,[1] = 0x0,[2] = 0x0,[3] = 0x0,[4] = 0x0,[5] = 0x0,[6] = 0x0,[7] = 0x0,[8] = 0x0,[9] = 0x0,[10] = 0x0,[11] = 0x0,[12] = 0x0,[13] = 0x0,[14] = 0x0,[15] = 0x0,[16] = 0x0,[17] = 0x0,[18] = 0x0,[19] = 0x0,[20] = 0x0,[21] = 0x0,[22] = 0x0,[23] = 0x0,[24] = 0x0,[25] = 0x0,[26] = 0x0,[27] = 0x0,[28] = 0x0,[29] = 0x0,[30] = 0x0,[31] = 0x0,[32] = 0x0,[33] = 0x0,[34] = 0x0,[35] = 0x0,[36] = 0x0,[37] = 0x0,[38] = 0x0,[39] = 0x0,[40] = 0x0,[41] = 0x0,[42] = 0x0,[43] = 0x0,[44] = 0x0,[45] = 0x0,[46] = 0x0,[47] = 0x0,[48] = 0x0,[49] = 0x0,[50] = 0x0,[51] = 0x0,[52] = 0x0,[53] = 0x0,[54] = 0x0,[55] = 0x0,[56] = 0x0,[57] = 0x0,[58] = 0x0,[59] = 0x0,[60] = 0x0,[61] = 0x0,[62] = 0x0,[63] = 0x0,[64] = 0x0,[65] = 0x0,[66] = 0x0,[67] = 0x0,[68] = 0x0,[69] = 0x0,[70] = 0x0,[71] = 0x0,[72] = 0x0,[73] = 0x0,[74] = 0x0,[75] = 0x0,[76] = 0x0,[77] = 0x0,[78] = 0x0,[79] = 0x0,[80] = 0x0,[81] = 0x0,[82] = 0x0,[83] = 0x0,[84] = 0x0,[85] = 0x0,[86] = 0x0,[87] = 0x0,[88] = 0x0,[89] = 0x0,[90] = 0x0,[91] = 0x0,[92] = 0x0,[93] = 0x0,[94] = 0x0,[95] = 0x0,[96] = 0x0,[97] = 0x0,[98] = 0x0,[99] = 0x0,[100] = 0x0,[101] = 0x0,[102] = 0x0,[103] = 0x0,[104] = 0x0,[105] = 0x0,[106] = 0x0,[107] = 0x0,[108] = 0x0,[109] = 0x0,[110] = 0x0,[111] = 0x0,[112] = 0x0,[113] = 0x0,[114] = 0x0,[115] = 0x0,[116] = 0x0,[117] = 0x0,[118] = 0x0,[119] = 0x0,[120] = 0x0,[121] = 0x0,[122] = 0x0,[123] = 0x0,[124] = 0x0,[125] = 0x0,[126] = 0x0,[127] = 0x0,[128] = 0x0,[129] = 0x0,[130] = 0x0,[131] = 0x0,[132] = 0x0,[133] = 0x0,[134] = 0x0,[135] = 0x0,[136] = 0x0,[137] = 0x0,[138] = 0x0,[139] = 0x0,[140] = 0x0,[141] = 0x0,[142] = 0x0,[143] = 0x0,[144] = 0x0,[145] = 0x0,[146] = 0x0,[147] = 0x0,[148] = 0x0,[149] = 0x0,[150] = 0x0,[151] = 0x0,[152] = 0x0,[153] = 0x0,[154] = 0x0,[155] = 0x0,[156] = 0x0,[157] = 0x0,[158] = 0x0,[159] = 0x0,[160] = 0x0,[161] = 0x0,[162] = 0x0,[163] = 0x0,[164] = 0x0,[165] = 0x0,[166] = 0x0,[167] = 0x0,[168] = 0x0,[169] = 0x0,[170] = 0x0,[171] = 0x0,[172] = 0x0,[173] = 0x0,[174] = 0x0,[175] = 0x0,[176] = 0x0,[177] = 0x0,[178] = 0x0,[179] = 0x0,[180] = 0x0,[181] = 0x0,[182] = 0x0,[183] = 0x0,[184] = 0x0,[185] = 0x0,[186] = 0x0,[187] = 0x0,[188] = 0x0,[189] = 0x0,[190] = 0x0,[191] = 0x0,[192] = 0x0,[193] = 0x0,[194] = 0x0,[195] = 0x0,[196] = 0x0,[197] = 0x0,[198] = 0x0,[199] = 0x200000000003bd70,[200] = 0x0,[201] = 0x1,[202] = 0x200000000035f8c0,[203] = 0x200000000003b8f0,[204] = 0x200000000003b900,[205] = 0x200000000003b910,[206] = 0x200000000035d438,[207] = 0x1,[208] = 0x0,[209] = 0x0,[210] = 0x
0,[211] = 0x0,[212] = 0x0,[213] = 0x0,[214] = 0x0,[215] = 0x0,[216] = 0x0,[217] = 0x0,[218] = 0x0,[219] = 0x0,[220] = 0x0,[221] = 0x0,[222] = 0x0,[223] = 0x0,[224] = 0x0,[225] = 0x0,[226] = 0x0,[227] = 0x0,[228] = 0x0,[229] = 0x0,[230] = 0x0,[231] = 0x0,[232] = 0x0,[233] = 0x0,[234] = 0x0,[235] = 0x0,[236] = 0x0,[237] = 0x0,[238] = 0x0,[239] = 0x0,[240] = 0x0,[241] = 0x0,[242] = 0x0,[243] = 0x0,[244] = 0x0,[245] = 0x0,[246] = 0x0,[247] = 0x0,[248] = 0x0,[249] = 0x0,[250] = 0x0,[251] = 0x0,[252] = 0x0,[253] = 0x0,[254] = 0x0,[255] = 0x0;
binmap = [0] = 0,[1] = 0,[2] = 0,[3] = 0;
next = 0x0;
system_mem = 0;
max_system_mem = 0;
}
sytem_mem = 0 seems a little low..... is this normal?
what else could i print in this routine to get usefull debugging info?

I am getting the following message when i run my fortran 77 code on SGI altix 3700 using an intel-8.1-latest compiler.

getRegFromUnwindContext: Can't get Gr0 from UnwindContext, using 0
forrtl: severe (60): infinite format loop, unit 6, file /dev/pts/43
Image PC RoutineLine Source
a.out 4000000000118880 Unknown Unknown Unknown
a.out40000000001179D0 Unknown UnknownUnknown
a.out40000000000B5060 Unknown Unknown Unknown
a.out40000000000602A0 Unknown Unknown Unknown
a.out4000000000061220 Unknown Unknown Unknown
a.out4000000000095910 Unknown UnknownUnknown
a.out&nbs! p; 40000000000921D0 UnknownUnknown Unknown
a.out400000000001C6E0 UnknownUnknown Unknown
a.out4000000000003410 UnknownUnknownUnknown
libc.so.6.120000000001AE990 Unknown UnknownUnknown
a.out4000000000002E00 UnknownUnknown!Unknown

I have no idea what the error means. The code ran with no problems on compaq visual fortran.I have noidea why I am getting this error.
any suggestion to help me out to solve this will be greatly
appreciated.
cheers
Madhu

if I understood it correctly the Can't get Gr0 from UnwindContext message has nothing to do with the actual error
compile your program with -g -C and run it with idb. that should get you started with the debugging. it seems that unit 6 is pointing to this /dev/ thing is that oke?

cheers
Thomas

Yes, you need to switch on the debug flags that should give you the entry points in your program.

Look for writes to unit 6, or the standard output using write(*,*) or print *, these all point to your standard output or login session type something like who at the command line to see that this pts number is indeed refering to a login session of your own. This should give you a good start in finding the problem.

Hi!
Thanks for your help. I figured it out where the problem is occuring. I have checked the values of teh variables after each interation on both the compilers (intel 8 and CVF ) . Well the problem is one of my variables is going to NaN after3 iterations.but I don have this problem while i runthis code in CVF. as i said before i am running F77 code using intel-8.1-latest (unix) compiler.

The F77 code is for IMPILICT DOUBLE PRECESION. do the definition of arrays changes ? I don know what the error could be. The one thing i know is that i got the previous error because of one of my variable is going NaN

The command which i use to compile the code is :ifort -FI filename.

do u have to include more options in my compile command.

any suggestion will be appreciated.

thanks
cheers
madhu

Hi,

Glad you have got closer to you answer. I assume by you statements that you have a successful version of the code compiled under CVF (is this IA32) but no success under Altix (IA64)? In which case it will prove difficult to track down the error as they are very different problems, one is 32 bit addressing on IA32 processors the other is 64 bit addressing on IA64 processors in an SGI env.

When I hit these problems I tend to start getting my hands dirty and start debugging the code (track this NaN to its source). Certainly there are many differences between the compilers for the different architectures which do not make the path clear, but try looking at some of the mathematical based options in the compiler related to floating point errors. Hopefully one of these should track (and hopefully the first) operations with a illegal fp value.

Leave a Comment

Please sign in to add a comment. Not a member? Join today