Arquitetura Intel® Many Integrated Core

First touch time greater than parallel time

Hi all,

I was looking to parallelize my code for speedup.

As xeon phi was a NUMA core I used the first touch placement of the data.

while xeon phi is performing better than xeon no doubt, the problem is that totaltime(time for first touch+looptime) is greater.

How do I resolve this issue?

This code when integrated into the main code(cannot post it here) will call state function many times from various different places. So is it possible that even if I dont first touch as I have in the code attached below this overhead is just a onetime problem?

Intel MIC MPI symmetric job profiling using Vtune


I want  to profile the my MPI application executing on HOST+MIC using  symmetric mode execution. I used the following command but it says cannot execute binary. I source the then used the following 

mpirun -host test -n 2 amplxe-cl -collect hotspots -r result-dir1 ./hello : -host test-mic0 -n 4 amplxe-cl -collect hotspots -r result-dir1 ./hello.mic

Can someone help me to profile my MPI application in symmetric mode execution.

As a second option I tried

Compiling for Xeon Phi co-processor

Can somebody help me build Aerospike database server for Intel Xeon Phi Co processor. A step by step guide would be appreciated (as I am new to the Intel MIC). I am able to build the database server on the host environment but it is native execution of the server on Xeon phi co-processor where I have completely lost. Thank you in advance.


Hey Guys, 

I am playing with the CPU_MASK mechanism in COI (both in mpss 3.5.2 and mpss 3.6). However, I found it is not working as I expected. Suppose that we have 224 threads (on a Phi) and we divide them into 4 partitions. Thus, each partition of threads has 56 threads. 

Partition 1: thread 1 -- thread 56

Partition 2: thread 57 -- thread 112

Partition 3: thread 113 -- thread 168

Partition 4: thread 169 -- thread 224. 

controlling MPI-3 shared memory allocation target


I'd like to request a feature for using Intel MPI on KNL.

MPI-3 allows to allocate memories shared among the MPI tasks on the same node through MPI_Win_allocate_shared. I'd like to have the control of where this chunk of memory is seated, either on MCDRAM or DDR when I use the flat mode of MCDRAM. I need something similiar to hbwmalloc which complements malloc for targeting MCDRAM or at least a way to control the target.

Thanks. Ye

Intel Parallel Studio XE 2016 for Windows Can't install !

Hi ,

         There was a problem I install Intel parallel studio(apply for a serial number). During the installation process appears rolling back  failed components,finally installation process failure.

 PS: When I first use evaluation version: no problem

 How can I solve this problem?


Assine o Arquitetura Intel® Many Integrated Core