Segmentation fault when trying to run the code compiled with -O0 on Xeon Phi

Segmentation fault when trying to run the code compiled with -O0 on Xeon Phi

We try to debug on Xeon Phi the code which contains the following code construction:

#include <iostream>
#include <memory.h>
#include <immintrin.h>
using namespace std;
void f( float* _amatr)
{
 __m512 a;
 a = _mm512_load_ps(_amatr+1);
 _mm512_store_ps(_amatr+1, a);
}
int main(int argc, char* argv[])
{
 __attribute__((aligned(64))) float _amatr[256];
for(int i=0; i<256; i++)
 _amatr[i] = i+1;
f(_amatr);
cout<<"It works\n";
return 0;
}

This code is successfully built with any compilers flags.
Application normally runs only when it built with code optimisation (without additional flags, or with anyone optimistaion flags: -O1, -O2, -O3),

icpc -mmic PhiFunc.cpp
scp a.out mic0:~
ssh mic0
./a.out
It works

but segnentation fault error appears when we try to run this code compiled with -O0.
icpc -mmic -O0 PhiFunc.cpp
scp a.out mic0:~
ssh mic0
./a.out
Segmentation fault

And major problem is impossibility debugging our complicated code because it have to use -O0 flag.

5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi Vitaly,

Since _amatr[0] is 64-byte aligned, _amatr[1] is not.  And you attempted to load with _mm512_load_ps (at line #8), which requires your address 64-by-aligned.  debugger indicates that too.  See below.  The reason why you didn't get a seg fault at -O2 is that icc removed the function call to f() in dead code removal phase.

$ idbc_mic -tco -rconnect=tcpip:mic0:2000

(idb) idb file-remote /root/a.out
(idb) file ~/temp/a.out
Reading symbols from /root/temp/a.out...done.
(idb) run
Starting program: /root/temp/a.out
[New Thread 17611 (LWP 17611)]
Program received signal SIGSEGV
f (_amatr=0x7fff0b86a3c0) at /root/temp/func.cpp:8
8 a = _mm512_load_ps(_amatr+1);
(idb)

Thanks.

Feilong H ,

Thank You for reply. Now we use idb with Eclipse integration.

Vitaly, please do not make the same post into multiple forums. For others that are reading this, use zmmintrin.h in place of xmmintrin.h and use the unaligned load and store (_mm512_loadu_ps).

Jim Dempsey

www.quickthreadprogramming.com

Login to leave a comment.