Intel® Many Integrated Core Architecture (Intel MIC Architecture)

mpss 3.4.3 mpss fails to start @31S1 RHEL7

I cannot get mpss to start

mpss.service - Intel(R) MPSS control service
   Loaded: loaded (/etc/systemd/system/mpss.service; disabled)
   Active: failed (Result: exit-code) since Tue 2015-04-14 12:47:36 BST; 9min ago
  Process: 6230 ExecStart=/etc/init.d/mpss start (code=exited, status=1/FAILURE)

Starting Intel(R) MPSS control service...
Starting Intel(R) MPSS: [FAILED]
mpss.service: control process exited, code=exited status=1
Failed to start Intel(R) MPSS control service.
Unit mpss.service entered failed state.

how to transpose part of big matrix in and out phicard

I have a host with three phi card,and a big matrix(it is so large that it cannot be directly copoed to phi card) need be divided three part then offload to phi card,doing some processing,then each part of big matrix need transpose back to host.

how could I implement this using c++?

prefetch keyword misnomer?

Dear forum,

Looking at the assembly code, the prefetch intrinsics function with _MM_HINT_T2 hint is compiled to vprefetch2, which, according to MIC instruction set manual, is a non-temporal L2 prefetch. This seems to contradict the temporal nature implied in "T". Or did I miss anything?

MIC Fortran Error : could not find 'k1om-mpss-linux-ld.exe'

I am using the 64 bit Intel Visual Fortran compiler through visual studio 2013 and Windows 7 professional and am encountering an error which prevents compilation. I stepped back to the beginning Xeon Phi labs and used the provided code as a simplest case which I knew should work and the error was still present (Code used included for reference).

Intel Xeon Phi - MPI application


I created simple "Hello world" apllication. I tried to run program, as show in this article:

But in result I got following error:

# bash: /opt/intel//impi/ cannot execute binary file

I use system with HOME directory shared between host and card.

What Is the problem? Thanks for help.


No speedup with TBB and Cilk Plus sorting algorithms

I cannot get any speedup with <b>TBB</b> and <b>Cilk Plus</b> sorting algorithms on Xeon Phi, namely <pre class="brush:cpp">tbb::parallel_sort()</pre>, <pre class="brush:cpp">cilkpub::cilk_sort_in_place()</pre>, and <pre class="brush:cpp">cilkpub::cilk_sort()</pre>. I have tried to use 2, 4, 16, 61, 122 threads. With the very same program, the speedups on the 16-core Xeon host are excellent. The compiler is the same (Intel 15.0.2), the only difference is the -mmic command line argument and linking against MIC libraries.

_mm256_add_ps crashes program

Hello ,

I am using in my code something like:

int x , y;

float * TempD = (float*) _mm_malloc( N * sizeof(*TempD) ,64 );
__m256  * SIMDTempD = (__m256*) TempD;
__m256  * theX = (__m256*) X;
__m256  * theY = (__m256*) Y;
__m256i * theV = (__m256i*) V;
__m256i * theVoronoi = (__m256i*) Vor;

__m256 Xd ,Yd ,XdSquared ,YdSquared;


and then in a loop:

Assine o Intel® Many Integrated Core Architecture (Intel MIC Architecture)