-O0 -openmp generates a segfault

-O0 -openmp generates a segfault

Hello,I am trying to run a simple OpenMP matrix multiplication code with array sizes 200*200.I found out that when my code is compiled with optimizations disabled such as "icc -openmp -O0 matmul.c" and when executed a segfault occurs. However, when compiled simply "icc -openmp matmul.c" the code works properly.I would like to disable all the optimizations when running the openmp code, that's is why I need -O0.Can anyone help with the problem?Thank you,

7 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Is it a stack overflow segfault? If it is, you can use "ulimit -s 999999999" or some other large number to increase the stack size.

If not, what's the seg fault and where's it happening? Can you get a trace from gdb?

Brandon Hewitt Technical Consulting Engineer For 1:1 technical support: http://premier.intel.com Software Product Support info: http://www.intel.com/software/support

I doubt that it is stack overflow segfault, because the executable runs correctly without -O0 flag.So below is the trace from gdb:[New Thread 0x7f1194d79710 (LWP 9376)][New Thread 0x7f1193f70710 (LWP 9377)][New Thread 0x7f1192f6f710 (LWP 9378)][New Thread 0x7f1191f6e710 (LWP 9379)][New Thread 0x7f118bfff710 (LWP 9380)][New Thread 0x7f118affe710 (LWP 9381)][New Thread 0x7f1189ffd710 (LWP 9382)][New Thread 0x7f1188ffc710 (LWP 9383)]Program received signal SIGSEGV, Segmentation fault.[Switching to Thread 0x7f1189ffd710 (LWP 9382)]0x0000000000400ef8 in L_main_68__par_loop0_2_27 () at matmul_p.c:7474 c[i][j] += a[i][k] * b[k][j];(gdb) bt#0 0x0000000000400ef8 in L_main_68__par_loop0_2_27 () at matmul_p.c:74#1 0x00007f1194c323d3 in __kmp_invoke_microtask () from /opt/intel/Compiler/11.1/072/lib/intel64/libiomp5.so#2 0x00007f1194c0f796 in __kmpc_invoke_task_func () from /opt/intel/Compiler/11.1/072/lib/intel64/libiomp5.so#3 0x00007f1194c108e3 in __kmp_launch_thread () from /opt/intel/Compiler/11.1/072/lib/intel64/libiomp5.so#4 0x00007f1194c38347 in ?? () from /opt/intel/Compiler/11.1/072/lib/intel64/libiomp5.so#5 0x00007f11944dba4f in start_thread () from /lib64/libpthread.so.0#6 0x00007f119424582d in clone () from /lib64/libc.so.6#7 0x0000000000000000 in ?? ()Thanks,

It looks like you need at least to link with -traceback, if not build with -g, to get a useful traceback.
One thing that sometimes happens is that optimization has a similar effect as OpenMP private specification of a variable. Of course, technically, it's better to declare all necessary privates rather than depend on optimization.

Actually I compiled with -g to get the above backtrace.Perhaps someone could try to run the matrix multiply code with the given flags to regenerate the problem.Below is the code:#include #include #include #include #define NRA 200 int main (int argc, char *argv[]){int tid, nthreads, i, j, k, chunk;double a[NRA][NRA], /* matrix A to be multiplied */ b[NRA][NRA], /* matrix B to be multiplied */ c[NRA][NRA]; /* result matrix C */ /*** Initialize matrices ***/ for (i=0; i for (j=0; j a[i][j]= i+j; b[i][j] = i*j; c[i][j] = 0;} /*** Do matrix multiply sharing iterations on outer loop ***/#pragma omp parallel for for (j=0; j for(i=0; i for (k=0; k{ c[i][j] += a[i][k] * b[k][j];}printf("The %f \n", c[3][2]);printf("******************************************************\n");return 0;}Thanks,

Hi,

In your code you should make i and k private variables, since otherwise i and k will be shared across the diffent threads and your inner looks to not execute correctly. With optimization the compiler hides this from your eyes, because it will keep i and k in registers.

For j, you don't need to do anything since OpenMP defines the loop counter of the loop associated with the "parallel for" is automatically privatized.

So your code snippet should look like this:

#pragma omp parallel for private (i, k) for (j=0; j for(i=0; i for (k=0; k{ c[i][j] += a[i][k] * b[k][j];}
Cheers,
-michael

OpenMP specifies that the outer for loop index defaults to private, but the inner ones must be given local scope explicitly, as Michael showed you. If you are able to use C99 or C++, you have the alternative:
#pragma omp parallel for for (j=0; j for(int i=0; i for (int k=0; k
Fortran OpenMP rules make all the inner loop indices private automatically. The compiler optimizer may happen to treat C for loops in a similar way without barfing over the programming error. Evidently, you have no assurance of the detailed behavior in such a case.

Leave a Comment

Please sign in to add a comment. Not a member? Join today