Intel® Many Integrated Core Architecture (Intel MIC Architecture)

Host shared memory?

Hi,

One application I have requires the solution of a complex PDE that uses significant amounts of memory (can be up to 1GB).  On standard MPI clusters, the large memory requirement is not a problem, as nodes typically have many GB of memory so I can run as as many 1GB processes as there are cores on an HPC cluster.  However, the Intel 5110P Phi card, which functions as a single MPI node, only as ~8 GB of memory, so I can't run 60 processes each at 1GB.  

Intel Releases SDK with OpenCL* 1.2 support for Intel® Xeon Phi™ Coprocessors

The new Intel® SDK for OpenCL* Applications XE 2013 includes certified OpenCL 1.2 support for Intel® Xeon® processors and Intel® Xeon Phi™ coprocessors using Linux* operating systems. This SDK is targeted at developers of highly parallel applications including High Performance Compute (HPC), workstations, and data analytics, to name just a few. OpenCL broadens the parallel programming options on Intel® architecture and allows developers to maximize data parallel application performance on Intel Xeon Phi coprocessors.

host to mic bandwidth using MPI

Hi, anyone has the result of using mpi to test the host<-> mic bandwidth? I tried on my machine, the bandwidth is quite low (~0.4GB/sec). I just send data from host to the mic card using blocking function and measure the time. The downloadspeed test in the shoc benchmark can generate up to 10GB/sec. Any idea about the low bandwidth using MPI? Thanks a lot!

Xeon Phi compatibility with Dell workstation

I am trying to buy an Intel Xeon Phi 5110P card for a research project that I have. It is very difficult to find information on the compatibility of the Xeon Phi with specific workstations. I am interested in the Dell Precision T5600 workstation and I am trying to find if the Xeon Phi is compatible with it. Although Dell appears in Intel's "Where to buy list" for Xeon Phi, I have not been able to find this information from Dell (but it is possible to configure online the T5600 with a Nvidia Tesla K20C).

Complex Division Performance Issue

I have noticed a performance issue with complex division on the MIC. Dividing two complex numbers by using the division operator is about 22x slower than if the operation is explicitly coded using the complex conjugate (see attached source file). I passed the -fcode-asm flag to the ifort compiler to dump the assembly code and noticed an unexpected difference. In the former case a call is made to an SVML subroutine named __svml_cdiv8, but in the latter the code is inlined. For the CPU inlined code is always used (meaning no calls to the external VML library).

Able to use fabric dapl but ofa

I'm able to use I_MPI_FABRICS=dapl but not I_MPI_FABRICS=ofa on my system.

For example I'm using IMB to test out the performance using command:

mpiexec.hydra -genv I_MPI_FABRICS=shm:tcp -n 1 -host bio-xinyi ~/tmp/imb/imb/3.2.4/src/IMB-MPI1 -off_cache 12,64 -npmin 64 -msglog 24:28 -time 10 -mem 1 PingPong Exchange : -n 1 -host mic0 /tmp/IMB-MPI1.mic

When using I_MPI_FABRICS=ofa, it shows:

Seiten

Intel® Many Integrated Core Architecture (Intel MIC Architecture) abonnieren