Intel® Many Integrated Core Architecture (Intel MIC Architecture)

Efficiently Use KNC Instructions on Unaligned Data

MIC requires strict 64Byte data alignment to utilize vpu, but why? I found Sparc also have such an requirement. But other multi-core CPU can handle unaligned data.

As MIC can automatically vectorize a for loop of data(with compiler optimization), what if the data is unaligned in this case? will the auto optimization still work?  if yes, how?


I would like to clarify my problem here.

Knight's Landing + Java

Dear Intel Staff,

I just got to know some details of your great presentation of Knight's Landing (KNL) at Hot Chips this year. Information about KNL on the website is still sparse. From your slides I understand that there will be a version of KNL that is socked and can be used as a primary CPU in a rack. However, this raises quite some questions that I cannot find satisfying answers.

Questions about SCIF Driver

I have a system with 2 PHI cards installed running on redhat 7.0. I am able to run code on the cards as pure offload and I can ssh into the cards. I am trying to get symmetric mode to work.

1) Does symmetric mode require OFED, or is OFED only required when there is a physical Infiniband card?

2) What are the proper steps to verify that the SCIF driver is properly loaded? mic shows up as a driver but there is no indication of anything named SCIF. 

iconv issue

hi all,


I'm trying to build something for the Phi that depends on iconv; the library routines are present , but the following application fails when run on the Phi:

#include <stdlib.h>
#include <iconv.h>

int main () {
  iconv_t cd;
  cd = iconv_open("latin1","UTF-8");
  if(cd == (iconv_t)(-1)) exit(1);


if I build this using "icc -o iconv_test iconv_test.c" and run it on the host it return no error (exit code 0).

Xeon Phi and offload from MATLAB MEX file


I am having a really hard time figuring out how to use the Xeon Phi offload mode from within MATLAB MEX files under Linux. I have managed to force MATLAB to use icc for compilation and verified that the mex files run fine. The problems start when using the offload pragma - as far as I can tell, nobody has tried that yet and I suspect this is some (fixable?) issue with libraries. Can someone here help me with this?

Consider the following simple code

How to allocation MICs to all the MPI processors equally for AO?

Could you please take a look at this problem? My machine has 16 CPUs and 4 MICs (47 coprocessors each), and I run my program with 8 MPI processors (mpi_comm_size = 8) and want to use MKL routines with automatic offload (AO) mode. As you can see in the test code attached, I tried three different methods.
METHOD-1: I allocate the 4 MICs to the first 4 CPUs each and let the other CPUs run w/o MIC. In this case the program works well as expected and I got the following performance test result when solving zgemm for 5k*5k size of complex & dense matrices.

Subscribe to Intel® Many Integrated Core Architecture (Intel MIC Architecture)