dstegr function causes segmentation fault

dstegr function causes segmentation fault

My colleague and I tested a very simple program using 'dstegr' in MKL, but we found that the functions seems to be not good at all. The code is here...

#include 
#include 
#include 
#include 

#define N 2000

int main(int argc, char *argv[])
{
 char jobz='V', range='I';
 MKL_INT n=N;
 double d[N], e[N];
 double vl=0.0, vu=0.1, abstol=0.000001;
 MKL_INT il=1, iu=10, m=0, ldz=N;
 double w[N], z[N*N], work[18*N];
 MKL_INT isuppz[2*N], iwork[10*N];
 MKL_INT liwork=10*N, lwork=18*N;
 MKL_INT info=0;

 double duration;
 MKL_INT i;
 clock_t start, finish;

 for(i=0;i {
 d[i]=4.0;
 e[i]=1.0;
 }

 start = clock();

 dstegr_(&jobz,&range,&n,d,e,&vl,&vu,&il,&iu,&abstol,&m,w,z,&ldz,isuppz,work,&lwork,iwork,&liwork,&info);

 finish = clock();
 duration = (double)(finish - start) / CLOCKS_PER_SEC;
 printf( "%f secondsn", duration );

 for(i=0;i<10;i++)
 printf("%.8f:%f:%.8fn",d[i],e[i],w[i]);

 return(0);
}

When using dynamic linking with either ilp64 or lp64 interface, segmentation fault will always come out.
When using static linking with ilp64 interface, it seemed to be ok.
When using static linking with lp64 interface, on different platforms, things are also different:

In a Xeon 5450 32G RHEL5.1 machine with Intel 11.0.083 C++ compiler and 10.1.1.019 MKL, when using lp64 interfaces, codes could be run, but segmentation fault occered when return from the function which called "dstegr".

Gdb shows:
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000001 in ?? ()

and valgrind shows:
==3255== Invalid write of size 8
==3255== at 0x40198A: main (tri.c:9)
==3255== Address 0x7FEFFF0F8 is on thread 1's stack
==3255==
==3255== Invalid write of size 8
==3255== at 0x4019A5: main (tri.c:13)
==3255== Address 0x7FEFFF0E0 is on thread 1's stack
==3255==
==3255== Invalid write of size 8
==3255== at 0x4019B3: main (tri.c:13)
==3255== Address 0x7FEFFF0E8 is on thread 1's stack
==3255==
==3255== Invalid write of size 8
==3255== at 0x4019BF: main (tri.c:13)
==3255== Address 0x7FEFFF0F0 is on thread 1's stack
==3255==
==3255== Invalid write of size 8
==3255== at 0x401A05: main (tri.c:26)
==3255== Address 0x7FEFFB260 is on thread 1's stack
==3255== Stack overflow in thread 1: can't grow stack to 0x7FEFFB260
==3255==
==3255== Process terminating with default action of signal 11 (SIGSEGV)
==3255== Access not within mapped region at address 0x7FEFFB260
==3255== at 0x401A05: main (tri.c:26)
==3255==
==3255== Invalid write of size 8
==3255== at 0x48022D8: _vgnU_freeres (vg_preloaded.c:56)
==3255== Address 0x7FD111158 is on thread 1's stack

In other two machines, one is Xeon 5504 RHEL 5.3 with Intel 11.1 compiler and 10.2 MKL, another is Xeon EX-4870 RHEL 6 with Intel parallel studio XE 2011, using lp64 will also cause segmentatoin fault, and the function seemed not run. Gdb gave the information below:

Program received signal SIGSEGV, Segmentation fault.
0x0000000000411b57 in mkl_lapack_dlar1v ()

We ran the code OK with the netlib LAPACK lib, so we thought the function in MKL is not correct implemented. Could this issue be confirmed and fixed soon?

20 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

???I don't think this is the MKL's problem.You can try to allocate the all working arrays dynamically( for exampledouble* w = (double*) malloc( N * sizeof(double) );insted ofdoublew[N],z[N*N]) and check how it will work.--Gennady

You are running out of stack. With N=2000, you require a stack of about 40 Mbytes.

If you compile with the /F40000000 option, your code will run without errors on Windows. Find out how to adjust the maximum run-time stack for your OS.

If you change N, you will have to recompute how much stack is required, and specify that to the compiler.

Gennady's suggestion is better in the long run.

In fact, we used dynamic memory allocation at first. While we met the segmentation fault, we changed the code to static arrays. So we don't think changing memory allocation mode will give us any help.

Quoting Gennady Fedorov (Intel)
???I don't think this is the MKL's problem.You can try to allocate the all working arrays dynamically( for exampledouble* w = (double*) malloc( N * sizeof(double) );insted ofdoublew[N],z[N*N]) and check how it will work.--Gennady

In fact, we used dynamic memory allocation at first. While we met the segmentation fault, we changed the code to static arrays. So we don't think changing memory allocation mode will give us any help.

In this case, which version of MKL do you use? and Could you give us your linking line?we will try to reproduce the problem with dynamically allocated arrays.--Gennady

Quoting mecej4
You are running out of stack. With N=2000, you require a stack of about 40 Mbytes.

If you compile with the /F40000000 option, your code will run without errors on Windows. Find out how to adjust the maximum run-time stack for your OS.

If you change N, you will have to recompute how much stack is required, and specify that to the compiler.

Gennady's suggestion is better in the long run.

So, I have changed my code again in dynamic allocation mode, like this:

#include 
#include 
#include 

#define N 2000

int main(int argc, char *argv[])
{
    char jobz='V', range='I';
    double *d, *e, *w, *z, *work;
    double vl=0.0, vu=0.1, abstol=0.000001;
    MKL_INT n=N, il=1, iu=10, m=0, ldz=N, info=0;
    MKL_INT *isuppz, *iwork, liwork=10*N, lwork=18*N;

    double duration;
    MKL_INT i;
    clock_t start, finish;

    d=(double*)malloc(N*sizeof(double));
    e=(double*)malloc(N*sizeof(double));
    w=(double*)malloc(N*sizeof(double));
    z=(double*)malloc(N*N*sizeof(double));
    work=(double*)malloc(18*N*sizeof(double));
    iwork=(MKL_INT *)malloc(10*N*sizeof(MKL_INT));
    isuppz=(MKL_INT *)malloc(2*N*sizeof(MKL_INT));

    for(i=0;i

compile the code:
icc -g -o tri tri.c -I$MKLROOT/include -L$MKLROOT/lib/em64t
-Wl,-Bstatic -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5
-lpthread

then run again:
0.010000 seconds
1.99999754:0.500001:2.00000246
1.49999692:0.666668:2.00000986
1.33332950:0.750002:2.00002218
1.24999538:0.800003:2.00003944
1.19999458:0.833337:2.00006162
1.16666044:0.857147:2.00008874
1.14285010:0.875005:2.00012078
1.12499215:0.888895:2.00015775
1.11110244:0.900007:2.00019966
1.09999051:0.909099:2.00024649
Segmentation fault

Even if I added a printf before the return function of the program, the info CAN be prinited out correctly.
I guess the function made some modification in the stack, so functions cann't return normally.

Quoting Gennady Fedorov (Intel)
In this case, which version of MKL do you use? and Could you give us your linking line?we will try to reproduce the problem with dynamically allocated arrays.--Gennady

We have 3 platforms.

Xeon 5450 / 32G / RHEL5.1 / Intel 11.0.083 compiler / 10.1.1.019 MKL
on this machine, codes can get computaiton result while using static library.

Xeon 5504 / 4G / RHEL 5.3 / Intel 11.1 compiler / 10.2 MKL
Xeon E7-4870 / 128G / RHEL 6 / Intel parallel studio XE 2011
on these two machines, codes can not get any result and report segmentation fault directly.

Before we ran the program, we had already set the memory limit and the stack limit to "unlimited".

Hi,

I've reproduced your problem even with N=200 (the same problem for static and dynamic MKL libraries)

Program received signal SIGSEGV, Segmentation fault.
0x0000000000411e57 in mkl_lapack_dlar1v ()
(gdb) bt
#0 0x0000000000411e57 in mkl_lapack_dlar1v ()
#1 0x0000000000407883 in mkl_lapack_dlarrv ()
#2 0x0000000000404ab0 in mkl_lapack_dstemr ()
#3 0x00000000004029b9 in mkl_lapack_dstegr ()
#4 0x000000000040252c in dstegr_ ()
#5 0x0000000000402176 in main (argc=1, argv=0x7fff985e4068) at dsteqr+.c:35

But your linking line is not fully correct

icc -g -o tri tri.c -I$MKLROOT/include -L$MKLROOT/lib/em64t -Wl,-Bstatic -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread

Pleasetry MKL Link Line Advisor

Thanks,
-- Victor

Quoting Victor Pasko (Intel)
Hi,

I've reproduced your problem even with N=200 (the same problem for static and dynamic MKL libraries)

Program received signal SIGSEGV, Segmentation fault.
0x0000000000411e57 in mkl_lapack_dlar1v ()
(gdb) bt
#0 0x0000000000411e57 in mkl_lapack_dlar1v ()
#1 0x0000000000407883 in mkl_lapack_dlarrv ()
#2 0x0000000000404ab0 in mkl_lapack_dstemr ()
#3 0x00000000004029b9 in mkl_lapack_dstegr ()
#4 0x000000000040252c in dstegr_ ()
#5 0x0000000000402176 in main (argc=1, argv=0x7fff985e4068) at dsteqr+.c:35

But your linking line is not fully correct

icc -g -o tri tri.c -I$MKLROOT/include -L$MKLROOT/lib/em64t -Wl,-Bstatic -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread

Pleasetry MKL Link Line Advisor

Yes. It's the same problem we met on the latter two machines I had introduced in my last post. It occurs in 10.2 or newer MKL.

In 10.1, things looks quite good until the caller function returns.

The MKL Link Line Advisor is quite a good and interest tool, thank you!

Legendary intelligence officer Drozdov was nicknamed «Fabergé» owing to his unique capability to work with information, to get information, and to convert it into the most precious treasures.

quote "It occurs in 10.2 or newer MKL...."I checked the problem with the latest 10.3.Update3 version and couldn't reproduce the segmentation .quote "It occurs in 10.2 or newer MKL...."I checked the problem with the latest 10.3.Update3 version and couldn't reproduce the segmentation.--Gennady

Quoting ?? ?

#defineN2000
......................

double...., z[N*N], ....

mecej4offered the correct decision: ".../F40000000 ..."
Intel MKL is not guilty.

Legendary intelligence officer Drozdov was nicknamed «Fabergé» owing to his unique capability to work with information, to get information, and to convert it into the most precious treasures.

Quoting yuriisig
Quoting ?? ?

#defineN2000
......................

double...., z[N*N], ....

mecej4offered the correct decision: ".../F40000000 ..."
Intel MKL is not guilty.

1. /Fn parameter does not exist under Linux icc

2. Stack size is not the key point. Dynamic allocation of memory also made seg fault.

Quoting Gennady Fedorov (Intel)
quote "It occurs in 10.2 or newer MKL...."I checked the problem with the latest 10.3.Update3 version and couldn't reproduce the segmentation .quote "It occurs in 10.2 or newer MKL...."I checked the problem with the latest 10.3.Update3 version and couldn't reproduce the segmentation.--Gennady

I updated the MKL to 10.3 update 3 last night (the E7-4870 machine with RHEL 6 x64), and seg fault is still there.

Is it possible that the problem has relationship with compiler/glibc/kernel/linux distribution/other libs?

Quoting ?? ?

Quoting Gennady Fedorov (Intel)
quote "It occurs in 10.2 or newer MKL...."I checked the problem with the latest 10.3.Update3 version and couldn't reproduce the segmentation .quote "It occurs in 10.2 or newer MKL...."I checked the problem with the latest 10.3.Update3 version and couldn't reproduce the segmentation.--Gennady

I updated the MKL to 10.3 update 3 last night (the E7-4870 machine with RHEL 6 x64), and seg fault is still there.

Is it possible that the problem has relationship with compiler/glibc/kernel/linux distribution/other libs?

Just now I tried install MKL 10.3 update3 on the Xeon 5450 RHEL 5.1 machine and didn't work neither...

Hello, we have finally reproduced the problem and has already found the cause of the problem. The problem escalated and would be fixed soon. I will let you know when the fix would be availble.--Gennady

Quoting ?? ?1. /Fn parameter does not exist under Linux icc

2. Stack size is not the key point. Dynamic allocation of memory also made seg fault.

okey.

But what for you use this procedure: it unreliable. And I about it already wrote.

Legendary intelligence officer Drozdov was nicknamed «Fabergé» owing to his unique capability to work with information, to get information, and to convert it into the most precious treasures.

Quoting yuriisig
Quoting ?? ?1. /Fn parameter does not exist under Linux icc

2. Stack size is not the key point. Dynamic allocation of memory also made seg fault.

okey.

But what for you use this procedure: it unreliable. And I about it already wrote.

It is used in a solver package named PSEPS (Parallel Symmetric Eigenvalue Package of Solver) which is developed by my collegue. I really know nothing about the scientific meaning and how and why he use the function. My duty is to maintain the machines and the system environments. Anyway, I had told him about your suggestion. Thank you!

Quoting Gennady Fedorov (Intel)
Hello, we have finally reproduced the problem and has already found the cause of the problem. The problem escalated and would be fixed soon. I will let you know when the fix would be availble.--Gennady

That's really good news for me. I'll wait for the solution.

Thanks all!

Leave a Comment

Please sign in to add a comment. Not a member? Join today