Running Xeon Phi using dockers

Hello, I am trying to configure and access Xeon-Phi by running a linux container running a centos image. 

Host OS: 4.2.0-coreos-r1 and I am running the centos image as a linux container. When I try to install the MPSS library it breaks in the build phase with following error message. Initially it was not able to find env variable for $(DESTDIR), so i have assigned a folder. But still it breaks and I am not able to find the reason.

Sequential Performance on the Xeon Phi

Hi, I have been running different benchmarks on the Xeon Phi. In comparison with a E5-2620 Xeon processor running at 2.00GHz, I noticed a large difference in the sequential performance (almost 10x considering different cases)

Can we conclude Xeon Phi always shares the frequency between hardware threads, even for the sequential codes? In other words, the clock frequency of 1.053GHz will be divided by 4 (if it switches between cores in a round-robin fashion)?

If that is true, would it be possible to take advantage of the full core's frequency at all?

code producing segmentation fault on offload with -openmp option

Hi all

My code has a module state_test which makes a call to state.

On offloading the call to state I get a segfault.

Here is the call to state

!dir$ offload begin target(mic:0)in(TRCR)out(RHOK1,RHOK2,RHOK3,RHOK4)
      call state(k,kk,TRCR(:,:,:,1), TRCR(:,:,:,2), this_block,RHOK1,RHOK2,RHOK3,RHOK4)
!dir$ end offload

However when offload is performed without the -openmp option it works fine.

Here is the compile line that segfaults.

ifort state_test.F90 state_mod.F90 -openmp

Использование изображений, доступных для чтения и записи, в OpenCL™ 2.0

While Image convolution is not as effective with the new Read-Write images functionality, any image processing technique that needs be done in place may benefit from the Read-Write images. One example of a process that could be used effectively is image composition. In OpenCL 1.2 and earlier, images were qualified with the “__read_only” and __write_only” qualifiers. In the OpenCL 2.0, images can be qualified with a “__read_write” qualifier, and copy the output to the input buffer. This reduces the number of resources that are needed.
  • Developers
  • Partners
  • Professors
  • Students
  • Android*
  • Apple OS X*
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Microsoft Windows* 10
  • Microsoft Windows* 8.x
  • Unix*
  • Android*
  • Game Development
  • Server
  • Windows*
  • C/C++
  • Beginner
  • Intermediate
  • OpenCL*
  • Coding OpenCL
  • Game Development
  • Graphics
  • Intel® Atom™ Processors
  • Intel® Core™ Processors
  • Microsoft Windows* 8 Desktop
  • Parallel Computing
  • First touch time greater than parallel time

    Hi all,

    I was looking to parallelize my code for speedup.

    As xeon phi was a NUMA core I used the first touch placement of the data.

    while xeon phi is performing better than xeon no doubt, the problem is that totaltime(time for first touch+looptime) is greater.

    How do I resolve this issue?

    This code when integrated into the main code(cannot post it here) will call state function many times from various different places. So is it possible that even if I dont first touch as I have in the code attached below this overhead is just a onetime problem?

    Intel MIC MPI symmetric job profiling using Vtune


    I want  to profile the my MPI application executing on HOST+MIC using  symmetric mode execution. I used the following command but it says cannot execute binary. I source the amplxe-vars.sh then used the following 

    mpirun -host test -n 2 amplxe-cl -collect hotspots -r result-dir1 ./hello : -host test-mic0 -n 4 amplxe-cl -collect hotspots -r result-dir1 ./hello.mic

    Can someone help me to profile my MPI application in symmetric mode execution.

    As a second option I tried

    Compiling for Xeon Phi co-processor

    Can somebody help me build Aerospike database server for Intel Xeon Phi Co processor. A step by step guide would be appreciated (as I am new to the Intel MIC). I am able to build the database server on the host environment but it is native execution of the server on Xeon phi co-processor where I have completely lost. Thank you in advance.


    Hey Guys, 

    I am playing with the CPU_MASK mechanism in COI (both in mpss 3.5.2 and mpss 3.6). However, I found it is not working as I expected. Suppose that we have 224 threads (on a Phi) and we divide them into 4 partitions. Thus, each partition of threads has 56 threads. 

    Partition 1: thread 1 -- thread 56

    Partition 2: thread 57 -- thread 112

    Partition 3: thread 113 -- thread 168

    Partition 4: thread 169 -- thread 224. 

    controlling MPI-3 shared memory allocation target


    I'd like to request a feature for using Intel MPI on KNL.

    MPI-3 allows to allocate memories shared among the MPI tasks on the same node through MPI_Win_allocate_shared. I'd like to have the control of where this chunk of memory is seated, either on MCDRAM or DDR when I use the flat mode of MCDRAM. I need something similiar to hbwmalloc which complements malloc for targeting MCDRAM or at least a way to control the target.

    Thanks. Ye

    Subscribe to Unix*