Computação de cluster

Trouble with Updating MPSS

My server has 4x Intel Xeon Phi 5110P accelerator cards. it runs Centos 6.5 with kernel version 2.6.32-431.29.2.el6.x86_64

When updating MPSS from 2.1 to 3.3.4 and 3.4.3, I receive the following error:

[root@XXXXX mpss-3.3.4]# /usr/bin/micflash -update -device all -smcbootloader
Error getting SCIF driver version
failed to open mic'0': /sys/class/mic/mic0/family: Knights Corner: not supported: Operation canceled

failed to open mic'1': /sys/class/mic/mic1/family: Knights Corner: not supported: Operation canceled

MPSS 3.5

Please note that the new MPSS 3.5 is just released at


This new version supports the following OS:


- Linux: RHEL* 6.4, 6.5, 6.6, 7.0 and 7.1 & SuSE SLES* 11 SP3 and SuSE 12.

- Microsoft Windows*: Windows* 7 Enterprise SP1, 8/8.1 Enterprise, Server 2008 R2 SP1, Server 2012 and Server 2012 R2.


Performance scale of the Intel Phi MIC


The attached is plot of execution time on Intel Phi with varying number of threads. The same program runs in native and offload modes.

The Phi device has 60 cores.

1) Why the timing steps don't occur at multiples of number of cores (i.e., multiple of 60s)?

2) Why the time drops substantially around 248 threads and increases again? (i.e., > 4x60)

Intel MPI Benchmarks Archives

I wish to use Intel MPI benchmarks for performance analysis of my mpich2 implementation. However, I'm using mpich2-1.4.1 version in my cluster. Where can I download the appropriate benchmarks for this version of mpich2? The latest version is not compatible with the mpich2 version I use.

Please help.

adding offload pragma , performance drops

Hello ,

I am running a code in openMP which is like this:


#pragma omp parallel for default( none ) shared( X , Y ,V ,V ,H , W ,N ) private ( i,x,y ,Kx,Ky,initD ,T ) 

		for ( y = 0; y < H; y++ )
			for ( x = 0; x < W; x++ )

				initD = aValue;
				for ( i = 0; i < N; i++ )
				V[ x + y * Width ] = T;


Now , I want to run it on mic card , so when I just add the line:

internal error: bad pointer

my code is this: ---------------------------------------------------------------------------------------------- #include class TEST{ public: double *A; public: TEST(double * _A){ A = _A; #pragma offload_transfer target(mic:0) nocopy(this : alloc_if(1) free_if(0)) in(A:length(2*3) alloc_if(1) free_if(0)) } void run(){ A[1] = 0; // double *B = A; std::cout<

Assine o Computação de cluster