Intel® Many Integrated Core Architecture (Intel MIC Architecture)

Xeon Phi and offload from MATLAB MEX file


I am having a really hard time figuring out how to use the Xeon Phi offload mode from within MATLAB MEX files under Linux. I have managed to force MATLAB to use icc for compilation and verified that the mex files run fine. The problems start when using the offload pragma - as far as I can tell, nobody has tried that yet and I suspect this is some (fixable?) issue with libraries. Can someone here help me with this?

Consider the following simple code

How to allocation MICs to all the MPI processors equally for AO?

Could you please take a look at this problem? My machine has 16 CPUs and 4 MICs (47 coprocessors each), and I run my program with 8 MPI processors (mpi_comm_size = 8) and want to use MKL routines with automatic offload (AO) mode. As you can see in the test code attached, I tried three different methods.
METHOD-1: I allocate the 4 MICs to the first 4 CPUs each and let the other CPUs run w/o MIC. In this case the program works well as expected and I got the following performance test result when solving zgemm for 5k*5k size of complex & dense matrices.

31S1P problems (MSI-X Enable-, or 4G Decoding, probably)

Hello, everyone.  I've been lurking on the forums for a few days now while I schemed up a cooling solution for my shiny new 31S1P. 

I'm pretty sure I've conquered the cooling requirements.  Check!

However, I cannot get the card to work correctly.  I'm using a Z97-WS motherboard with "4G Decoding" enabled in the BIOS settings. The CPU is a Celeron G1820 which is a cheap little lga1150 socket CPU that seemed to be enough for this rig.  I'm running the latest BIOS (2403, I believe from 2015-06-18 or thereabouts), latest version of CentOS 7.1, which is 7.1.1503 (Core). 

Phi seems not fully support AVX512? Any way to do MATRIX transpose?

I found in past topics that mm512_unpacklo_* is not supported on phi. In my own implementation, it seems mm512_permute* and mm512_shuffle* is also not supported. So far all matrix transpose operation in past posts seems implemented by using mm512_swizzle* and mm512_blend* instructions. However, use these two operations requires two times more element movement, seems low efficiency. Is their any other choices to do matrix transpose?


Not seeing mic device in OpenCL devices

I am trying to run a Xeon Phi card under Ubuntu Server 14.04. I have installed mpss-3.3.5, opencl_runtime_14.2_x64_4.5.0.8, and intel_code_builder_for_opencl_2014_4.6.0.178_x64.

miccheck says everything is "OK". micinfo seems to work. At this point OpenCL sees a cpu device and I can run OpenCL programs on the CPU device. I have setup an icd of /opt/intel/opencl-1.2- However I am still not seeing the mic device as an OpenCL device. 

Any ideas on debugging this?


(repost)TLS definition in section .tbss mismatches non-TLS definition in section .bss

Please see this post for the problem that i am facing while compiling NAMD for mic.
i am using  icc version (on centos 6.5 machine with mic) 15.0.0. for namd source compilation and i get :

OFFLOAD_REPORT explanation

Hi, Where can I get an explanation of the OFFLOAD_REPORT values? I using OFFLOAD_REPORT=2 and getting: [MKL] [MIC --] [AO Function] DPOTRF [MKL] [MIC --] [AO DPOTRF Workdivision] -1.00 -1.00 [MKL] [MIC 00] [AO DPOTRF CPU Time] 2.950591 seconds [MKL] [MIC 00] [AO DPOTRF MIC Time] 0.404681 seconds [MKL] [MIC 00] [AO DPOTRF CPU->MIC Data] 276480112 bytes [MKL] [MIC 00] [AO DPOTRF MIC->CPU Data] 199680000 bytes I don't understand the workdivision values of -1 and -1.

订阅 Intel® Many Integrated Core Architecture (Intel MIC Architecture)