Understanding Gather-Scatter instructions and the -gather-scatter-unroll compiler switch

Gather-Scatter instructions may not be the optimal choice of instructions when you are trying to achieve superior performance on the Intel® Xeon Phi™ coprocessor.  However, if your code uses indirect addressing or performs non-unit strided memory accesses, gather-scatter instructions may be the best option.

  • Developers
  • Professors
  • Students
  • Advanced
  • Intel® C++ Compiler
  • Intel® Fortran Compiler
  • MIC
  • Xeon Phi
  • Intel® Many Integrated Core Architecture
  • Setting up one common NFS server for mic on multiple hosts.

    I have four servers each hosting two phi cards. So far each of the servers export a volume using NFS which is mounted on the phis.  This works quite well except that one would like the NFS server to be common for  all the phi cards. As the routing is set up by default the mic0 and mic1 will only ping themselves and the host. Request to other servers is not routed. How do I set up the mics and the server to enable mounting of a common server for all the 8 phis ?

    [olews@compute-19-20-mic0 olews]$ route

    Kernel IP routing table

    MPI fabric "dapl" works between mic0 and mic1, but not between localhost and mic0

    I am trying to execute the Intel MPI benchmark in the following configuration: CentOS 6.5 with Intel MPI version and MPSS 3.1.2, and OFED installed from source. My network configuration is default (static pair produced by "micctrl --initdefaults"), and I have 1 node with two 3120A Xeon Phi coprocessors.

    The MPI benchmark works just fine with fabrics "tcp" or "shm:tcp". Namely, I am able to run the benchmark between localhost and mic0, and between mic0 and mic1. However, with fabric "dapl", I cannot run IMB between localhost and mic0:

    performance of SCIF RMA

    I tried to use SCIF RMA to exchange a large amount of data between two MICs but find it's almost impossible to align memory address of both card to 4K page.

    Using memory exchange through host gives me 1.7 speed up over single card. How much more performance would SCIF RMA give if I can get it work? I want to decide if I should continue working in this direction.

    Thanks a lot.

    micflash hangs of Ubuntu 12.04

    With slight modification to the Makefiles, I was able to compile most of the MPSS 3.1.1 and 3.1.2 components from the source o
    n Ubuntu 12.04 and have them working. However, micflash -ubpdate  is giving me problems, so does micflash -getversion. The mic
    flash doesn't seem to be able to read the flash from the mic card. Below are my steps:

    micctrl -rw
    micctrl -s
     Shows the ready state for both the devices

    micflash -vv -update -device all

    remote process, dlopen() failed undefined symbol:

    Full message:

    On the remote process, dlopen() failed. The error message sent back from the sink is /var/volatile/tmp/coi_procs/1/5414/load_lib/ifortoutjAzgEs: undefined symbol: cdata_
    offload error: cannot load library to the device 0 (error code 20)
    On the sink, dlopen() returned NULL. The result of dlerror() is "/var/volatile/tmp/coi_procs/1/5414/load_lib/ifortoutjAzgEs: undefined symbol: cdata_"

    I don't know why dlopen should be involved.  The source code fragment:

    Subscribe to Professors