New Jim Dempsey article: Elusive Algorithms – Parallel Scan


Since I haven't seen a notification of this elsewhere, the ever knowledgeable Jim Dempsey ( just published one of his great technical articles entitled, "Elusive Algorithms – Parallel Scan".

I believe this was an outgrowth of another discussion on the forums, "how to perform inclusive scan in C cilk".


Error getting OFED to compile when mic is selected.

Compiling OFED with with phi and --all fails when compiling compat-rdma.
Compile without the "--with-xeon-phi" option works. 

./ --with-xeon-phi --all

# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.1 (Maipo)

# uname -r

rpm -qi mpss-sdk-k1om-3.5-1.x86_64
Name        : mpss-sdk-k1om
Version     : 3.5
Release     : 1
Architecture: x86_64
Install Date: Thu 09 Apr 2015 10:00:28 PM CDT
Group       : base
Size        : 484359036
License     : various
Signature   : DSA/SHA1, Thu 02 Apr 2015 06:57:59 AM CDT, Key ID

Using MKL to generate random data on Xion Phi


I try to use MKL to generate lots of random data every time on Xeon phi, but the performance is very bad comparing the performance on Xeon CPU.(E5620) .

The attachment is the original code, and the compile option for Xeon Phi is -O3  -mkl -mmic. and it takes about 115 seconds, however when I run it on Xeon CPU,it only takes 3.5 seconds. I do not know why the difference is so much. Is the way in which I use the Xeon Phi  wrong or the real  performance on Xeon Phi is bad? 

Thank you!



not vectorizing for no reason

Hi all,

I have isolated a small section of a loop in my code to vectorize and test for other kinds of optimization a well(like alignment etc)

Here is the actual code.

WORK1(:,:,kk) =  KAPPA_THIC(:,:,kbt,k,bid)  * SLX(:,:,kk,kbt,k,bid) * dz(k)

The optrpt says this 

LOOP BEGIN at loop.F90(91,13)
   remark #15541: outer loop was not auto-vectorized: consider using SIMD directive
   remark #25436: completely unrolled by 8

Are there any instructions in k1om can replace lfence instruction in x86_64

I'm compiling Supersonic, an opensource database of google on Intel Phi using icc with option -mmic

but I find some lfence in the source code, but it seems that Phi doesn't support lfence instruction, so I want to replace lfence by some other instructions in Phi.

Is it practicable? for example,

Assine o Corporações