Parallel Computing

Ivy Bridge, counting cycles stalled due to LLC cache load misses?


On Ivy Bridge there are the following counters:


but no CYCLE_ACTIVITY.CYCLES_LLC_PENDING. I have performed some profiling and my results suggest you cannot just subtract the first two counters from the third counter, to get the LLC value. There are three counters for the number of times there is a cache miss, but I want to know the effect of stalling.

Unable to Compile GCC

I was following the instructions from, and I consistently got an error saying that "tm.h": no file or directory was found. I'm not sure what tm.h is, or what package it is included with.

I installed the following packages: GNAT (probably doesn't help), GMP, MPFR, MPC, ISL, Flex, Bison.

My configuration options were `./configure --prefix=/usr/lib/gcc/cilkplus --enable-languages="c,c++" --disable-multilib`

On Ubuntu 14.10. 

Inspector fails on most of OmpScr benchmarks


I am running inspector on the OmpScr benchmark. The benchmark allow you to specify a problem size for each of the programs. When I run inspector using small size problem it is able to finish the verification process most of the time but for some of the problem (i.e. c_loopB.badSolution1.par). I get the error:

Error: Internal error. Please contact Intel customer support team.

Compiler bug?

I am a PhD student and am working at classification of code fragments on a binary level. For that, I used the Intel compiler to compile several open source projects. While analyzing the code, I stumbled upon an interesting code snippet I cannot find another explanation than a potential bug.

something wrong with the offload out?

when I us offload like this

#pragma offload target(mic:0)           \
        L2 = tv.tv_sec*1000*1000 + tv.tv_usec;


        L2couple = tv.tv_sec*1000*1000 + tv.tv_usec;

there will be an error report below:

offload error: process on the device 0 was terminated by signal 11 (SIGSEGV)

and sometime the error report will be another different one

A weird linker error with _mm512_storenr_ps intrinsic in offload mode

Hi guys, I am facing a weird linker error with _mm512_storenr_ps() intrinsic in offload mode programming. I post this issue here and hope that someone could give the advice. 

I have implemented successfully a Xeon Phi program in native mode and then changed to offload mode. 

There are 3 files and the code is summarized like this

file main.cpp

#include myfunction.h

void main()


// CPU code


What is the correct way to load the Library Path?


I have some code which I compiled like this on my host:
$ ifort -openmp -mmic -o test.phi test.f90 -O4

I copied it up to the mic and tried to run

mic0$ ./test.phi
./test.phi: error while loading shared libraries: cannot open shared object file: No such file or directory

Oh! I read about this in the docuentation, the library path is missing. Simple fix, right? I NFS mount the /opt/intel up to the mic so it should go smoothly.
mic0$ source /opt/intel/composer_xe_2015.2.164/bin/ intel64

multi-threaded matrix multiplication

Dear Intel MKL developers,

I am integrating the MKL subroutine mkl_zcsrmultcsr in my MPI code. I tested a case with 16 processors, and mkl_zcrsmultcsr is called in every processor in parallel. Once it is called, multi-threaded computing is automatically activated.

The problem I encountered is that 12 processors among all processors work fine while the other 4 processors give memory corruption errors, moreover, these 4 processors can vary during each test. I am not sure what the problem would be. Your advise is well appreciated.





Parallel Computing abonnieren