Beginning Xeon Phi

Compile OpenMP or MPI Fortran code for Intel Phi

Hi everyone,

Here is my problem:

I have two different programs:

  • One in Fortran / MPI
  • One in Fortran / OpenMP

And I would like to compile them in order to have them running on an Intel Xeon Phi.

I just installed the free-version-for-academics of the Parallel Studio Cluster Edition 2016 on my server.

Here are my questions:

Running OpenCV Computer Vision Programs on Xeon Phi


I would like to know how to run opencv programs by using the intel xeon phi coprocessor card? What are my options? Also, how can i work with the Transparent API released by OpenCV and Xeon Phi ? Has Intel developed any module or support for running computer visions programs on the xeon phi?  Any suggestions or thoughts in this regard will be greatly appreciated.

performance difference in different MPSS versions


I ran a very simple benchmark code on two Xeon Phi cards with different MPSS versions and got different performance results in terms of FLOPS. Briefly, program running on mpss-3.1.2 got 1984 GFLOP/s for single precision floating point numbers, which is 98.2% of the peak performance; however, the same program running on mpss-3.3 only got 1580 GFLOP/s. I have tried several times to make sure I didn't do anything incorrectly.

Anybody has any ideas about the reason of this performance difference?


The benchmark code is as following:

Mounting filesystem on a MIC using NFS

Dear forumers,

I'm trying to mount some directories from the host on the MIC cards using NFS. 

I followed step by step the guide there.

My aim was to mount:

(host side)   (mic side)
/opt/intel -> /opt/intel
/home      -> /host

page table


I am curious to know how is page tables are implemented in MIC as well as on Xeon architecture or these are completely OS depend.

Is there a single page table which is accessed by all the cores or each core has a part of page table .


Xeon Phi Performance Question

We have a current software application that we currently only run a single instance of at once.  It requires no user input, simply runs for a pre-defined amount of time essentially navigating through a flowchart and spitting out data.  However, we desire to run multiple (100+) instances of it at once.  It is a relatively complicated program, with a decent amount of branching as it decides what to do with each node as it reaches it.  There is no current memory bandwidth issues, either.  However, it is technically running the same code in parallel...

Iscriversi a Beginning Xeon Phi