Hi,
I tried to read a SBox MMIO register of COREFREQ (address offset = 0x4100) to get current CPU frequency of Xeon Phi and the value obtained was 0x80010416. How can I get current CPU frequency from the value of COREFREQ register?
Hi,
I tried to read a SBox MMIO register of COREFREQ (address offset = 0x4100) to get current CPU frequency of Xeon Phi and the value obtained was 0x80010416. How can I get current CPU frequency from the value of COREFREQ register?
Hi,
One application I have requires the solution of a complex PDE that uses significant amounts of memory (can be up to 1GB). On standard MPI clusters, the large memory requirement is not a problem, as nodes typically have many GB of memory so I can run as as many 1GB processes as there are cores on an HPC cluster. However, the Intel 5110P Phi card, which functions as a single MPI node, only as ~8 GB of memory, so I can't run 60 processes each at 1GB.
The new Intel® SDK for OpenCL* Applications XE 2013 includes certified OpenCL 1.2 support for Intel® Xeon® processors and Intel® Xeon Phi™ coprocessors using Linux* operating systems. This SDK is targeted at developers of highly parallel applications including High Performance Compute (HPC), workstations, and data analytics, to name just a few. OpenCL broadens the parallel programming options on Intel® architecture and allows developers to maximize data parallel application performance on Intel Xeon Phi coprocessors.
Anybody has any experience / idea on whether the Dell Precision T5600 workstation and the Xeon Phi 5110P are compatible?
Hi, anyone has the result of using mpi to test the host<-> mic bandwidth? I tried on my machine, the bandwidth is quite low (~0.4GB/sec). I just send data from host to the mic card using blocking function and measure the time. The downloadspeed test in the shoc benchmark can generate up to 10GB/sec. Any idea about the low bandwidth using MPI? Thanks a lot!
I am trying to buy an Intel Xeon Phi 5110P card for a research project that I have. It is very difficult to find information on the compatibility of the Xeon Phi with specific workstations. I am interested in the Dell Precision T5600 workstation and I am trying to find if the Xeon Phi is compatible with it. Although Dell appears in Intel's "Where to buy list" for Xeon Phi, I have not been able to find this information from Dell (but it is possible to configure online the T5600 with a Nvidia Tesla K20C).
I am very interested in building a native application for Intel Xeon Phi Coprocessor. As you know, the embedded Linux operating system runs on the Intel Xeon Phi coprocessors. My question is as follows:
I have noticed a performance issue with complex division on the MIC. Dividing two complex numbers by using the division operator is about 22x slower than if the operation is explicitly coded using the complex conjugate (see attached source file). I passed the -fcode-asm flag to the ifort compiler to dump the assembly code and noticed an unexpected difference. In the former case a call is made to an SVML subroutine named __svml_cdiv8, but in the latter the code is inlined. For the CPU inlined code is always used (meaning no calls to the external VML library).
I'm able to use I_MPI_FABRICS=dapl but not I_MPI_FABRICS=ofa on my system.
For example I'm using IMB to test out the performance using command:
mpiexec.hydra -genv I_MPI_FABRICS=shm:tcp -n 1 -host bio-xinyi ~/tmp/imb/imb/3.2.4/src/IMB-MPI1 -off_cache 12,64 -npmin 64 -msglog 24:28 -time 10 -mem 1 PingPong Exchange : -n 1 -host mic0 /tmp/IMB-MPI1.mic
When using I_MPI_FABRICS=ofa, it shows:
Hi,
I experience a severe performance imbalance in our Xeon Phi (5110P, latest MPSS): a few (1-3) random CPU cores are 10-20% slower than all the other cores. I created a minimal example which demonstrates this (see below).
observations: