集群计算

Using MKL to generate random data on Xion Phi

Hi,

I try to use MKL to generate lots of random data every time on Xeon phi, but the performance is very bad comparing the performance on Xeon CPU.(E5620) .

The attachment is the original code, and the compile option for Xeon Phi is -O3  -mkl -mmic. and it takes about 115 seconds, however when I run it on Xeon CPU,it only takes 3.5 seconds. I do not know why the difference is so much. Is the way in which I use the Xeon Phi  wrong or the real  performance on Xeon Phi is bad? 

Thank you!

Qiang

 

not vectorizing for no reason

Hi all,

I have isolated a small section of a loop in my code to vectorize and test for other kinds of optimization a well(like alignment etc)

Here is the actual code.

WORK1(:,:,kk) =  KAPPA_THIC(:,:,kbt,k,bid)  * SLX(:,:,kk,kbt,k,bid) * dz(k)

The optrpt says this 

LOOP BEGIN at loop.F90(91,13)
   remark #15541: outer loop was not auto-vectorized: consider using SIMD directive
   remark #25436: completely unrolled by 8

Are there any instructions in k1om can replace lfence instruction in x86_64

I'm compiling Supersonic, an opensource database of google on Intel Phi using icc with option -mmic

but I find some lfence in the source code, but it seems that Phi doesn't support lfence instruction, so I want to replace lfence by some other instructions in Phi.

Is it practicable? for example,

Problem with Intel MPI on >1023 processes

I have been testing code using Intel MPI (version 4.1.3  build 20140226) and the Intel compiler (version 15.0.1 build 20141023) with 1024 or more total processes. When we attempt to run on 1024 or more processes we receive the following error: 

MPI startup(): ofa fabric is not available and fallback fabric is not enabled 

Anything less than 1024 processes does not produce this error, and I also do not receive this error with 1024 processes using OpenMPI and GCC.

订阅 集群计算