I cannot get any speedup with <b>TBB</b> and <b>Cilk Plus</b> sorting algorithms on Xeon Phi, namely <pre class="brush:cpp">tbb::parallel_sort()</pre>, <pre class="brush:cpp">cilkpub::cilk_sort_in_place()</pre>, and <pre class="brush:cpp">cilkpub::cilk_sort()</pre>. I have tried to use 2, 4, 16, 61, 122 threads. With the very same program, the speedups on the 16-core Xeon host are excellent. The compiler is the same (Intel 15.0.2), the only difference is the -mmic command line argument and linking against MIC libraries.
I am using in my code something like:
int x , y; float * TempD = (float*) _mm_malloc( N * sizeof(*TempD) ,64 ); __m256 * SIMDTempD = (__m256*) TempD; __m256 * theX = (__m256*) X; __m256 * theY = (__m256*) Y; __m256i * theV = (__m256i*) V; __m256i * theVoronoi = (__m256i*) Vor; __m256 Xd ,Yd ,XdSquared ,YdSquared;
and then in a loop:
I am using in my code intrinsics.
If I compile like:
icc -std=c99 -g -openmp -qopt-report=2 -o mycode mycode.c
I am receiving : Illegal instruction in line:
__m512 D = _mm512_set1_ps( FLT_MAX );
If I compile :
icc -std=c99 -g -mavx -openmp -qopt-report=2 -o mycode mycode.c
I am receiving: Illegal instruction in line:
I'm using mpss-3.4.3, with External bridge configuration.
Is it possible to synchronize time between mic and NTP-server? Or between mic and host?
Intel® MKL 11.3 Beta (released in April 2015) contains significant performance and scalability improvements for the direct sparse solver (a.k.a. Intel MKL PARDISO), on SMP systems. These improvements particularly benefit the Intel Xeon Phi coprocessors and Intel Xeon processors with large core counts. As an example, the chart below shows a 1.7x to 2.5x speedup of Intel MKL 11.3 Beta over Intel MKL 11.2, when using the PARDISO to solve various sparse matrices on an Intel Xeon Phi coprocessor with 61 cores.
I'm running RHEL 7.0 and I the system seems to have a problem talking to the Phi card.
This is what I see in lspci:
03:00.0 Co-processor: Intel Corporation Xeon Phi coprocessor 5100 series (rev ff) (prog-if ff)
!!! Unknown header type 7f
Kernel driver in use: mic
I've attached the micdebug log.
The upcoming next-generation Intel Omni-Path Architecture addresses lessons learned, good and bad, from Intel True Scale Architecture and standard InfiniBand*. In an effort to avoid observed pitfalls, Intel approached the architecture of an HPC fabric from a different perspective. The architectures for current products and Intel Omni-Path systems were explicitly developed from the ground up for MPI HPC clusters to bring out the best possible performance.
My server has 4x Intel Xeon Phi 5110P accelerator cards. it runs Centos 6.5 with kernel version 2.6.32-431.29.2.el6.x86_64
When updating MPSS from 2.1 to 3.3.4 and 3.4.3, I receive the following error:
[root@XXXXX mpss-3.3.4]# /usr/bin/micflash -update -device all -smcbootloader
Error getting SCIF driver version
failed to open mic'0': /sys/class/mic/mic0/family: Knights Corner: not supported: Operation canceled
failed to open mic'1': /sys/class/mic/mic1/family: Knights Corner: not supported: Operation canceled
Please note that the new MPSS 3.5 is just released at https://software.intel.com/en-us/articles/intel-manycore-platform-software-stack-mpss
This new version supports the following OS:
- Linux: RHEL* 6.4, 6.5, 6.6, 7.0 and 7.1 & SuSE SLES* 11 SP3 and SuSE 12.
- Microsoft Windows*: Windows* 7 Enterprise SP1, 8/8.1 Enterprise, Server 2008 R2 SP1, Server 2012 and Server 2012 R2.
The attached is plot of execution time on Intel Phi with varying number of threads. The same program runs in native and offload modes.
The Phi device has 60 cores.
1) Why the timing steps don't occur at multiples of number of cores (i.e., multiple of 60s)?
2) Why the time drops substantially around 248 threads and increases again? (i.e., > 4x60)