Professors

使用英特尔® Composer XE为至强融核™ 协处理器进行Offload 编译

Offload编译指的是在一个可运行的主机代码中加入编译指示或者某些新的关键字使代码段运行在基于英特尔集成众核架构(英特尔MIC架构)的协处理器上。其编程方式类似于使用OpenMP* 指示或英特尔 Cilk™ Plus关键字在串行代码中加入并行。

 

在编译代码时英特尔®编译器会同时为两种目标架构生成代码。该代码既可以在安装了协处理器的系统上运行,也可以在没有协处理器的系统上运行。这使得程序编写者无需担心目标系统是否具备运行协处理器代码的条件,从而简化了编程的复杂度。这种编译方式又被称作“Offload编译”或“异构编译”。

 

主机CPU和基于英特尔集成众核架构(英特尔MIC架构)的协处理器在硬件上并不共享物理或虚拟内存,所以英特尔编译器在编译过程中会通过加入代码来自完成主机和协处理器之间的传输数据(程序员无需编写额外代码)。目前,有两种可用的数据传输模型:

 

显式拷贝

程序员在offload pragma/directive中指定需要在主机和扩展卡之间拷贝的变量。

例如:

  • Professors
  • Linux*
  • Business Client
  • C/C++
  • Fortran
  • Intermediate
  • Intel® Parallel Studio XE
  • Intel Xeon Phi Coprocessor
  • Cluster Computing
  • Development Tools
  • Intel® Many Integrated Core Architecture
  • Loop tiling without adding overhead

    I am having a question , i just want to parallize one algorithm but i found that i am having a lot of cache misses , so i decided to do loop tiling but the problem was just due the loop tiling the threads becomes more rough and i especially the overhead with adding some extra loops makes the code less efficent, is there any way decrease the number of caches misses without doing loop tiling,the problem is just i cannot do any loop tiling because of race conditions. 

    the code looks like this

    #openmp for collapse(2)

    for(z...){

     for (y...){

       for(nAtoms){

    Preventing a double-precision number to be written to memory

    In a scientific application, I need to avoid the cost of writing data to memory. I want to prevent an array of double-precision numbers to be written to memory. The array should reside in L2 cache as long as possible. The size of the array is about 64 kilobytes. The array may be read or written by other threads. At the end of execution, the array can be written to memory. Is this achievable? Are there any pragmas or functions to enforce this constraint?

    Overheating Xeon Phi 7110P

    Hi,

    We have built our workstation with two Xeon Phi 7110p based on Intel W2600CR2 motherboard. Our accelerators are passively cooled. We have noticed that just after mpss service has been started, micsmc shows temperature around 100 oC and raising. Just around 140 oC ( which takes few seconds) micctrl shows "node lost" and we can do nothing except switch off and on the host. Reboot doesn't work - Xeon Phis were not visible in lspci unless host was not completely turned off and on again manually.

    trouble installing MPSS-3.1.1

            I follow the chapter 2.3 (steps to install  Intel MPSS with OFED support with mellanox* infiniband )of MPSS_Users_Guide.pdf to install MPSS. on the step 5, I get the following errors:

    warning: dapl-2.0.36.12-r0.glibc2.12.2.x86_64.rpm: Header V4 DSA/SHA1 Signature, key ID ab22bbe5: NOKEY

    warning: libibscif-3.1.1-r0.glibc2.12.2.x86_64.rpm: Header V4 DSA/SHA1 Signature, key ID 25a28f50: NOKEY

    warning: ofed-all-3.1.1-1.glibc2.12.2.x86_64.rpm: Header V4 DSA/SHA1 Signature, key ID 8ca98407: NOKEY

    Optimizing Hadoop Deployments

    This paper provides guidance, based on extensive lab testing conducted at Intel, to help IT organizations plan an optimized infrastructure for deploying Apache Hadoop*.  It includes:

    • Best practices for establishing server hardware specifications
    • level software guidance regarding the operating system (OS), Java Virtual Machine (JVM), and Hadoop version
    • Configuration and tuning recommendations to provide optimized performance with reduced effort
  • Partners
  • Professors
  • Students
  • Linux*
  • Cloud Services
  • Server
  • Intermediate
  • Big Data
  • Cloud Computing
  • Cluster Computing
  • Enterprise
  • Open Source
  • Power Efficiency
  • Return pointer from MIC malloc

    In my project, I need to pass pointer among different offload functions. However, I do not want to use global variables. For example. 

    I want to allocate an array on accelerator in new_array function, and hope that it would return an address on accelerator side so that I could pass the address to the next function exe_array. But, the following codes do not work.

    Any solution to this case? Say again, I do not want to use global variables. Thanks!

    #include <stdio.h>
    #include <stdlib.h>

    Subscribe to Professors