Intel® Many Integrated Core Architecture (Intel MIC Architecture)

PHI w/ MPICH (3.0.3) ch3:nemesis:scif

I'm trying to get MPICH (3.0.3) and SCIF working.

I'm using the tests from osu_benchmarks(from mvapich2 tarball) as a set of sanity checks, and I'm running into some unexpected errors.

One example: running osu_mbw_mr works sometimes, and then fail on the next try. The printout from two successive runs as well as the hosts file are below.
Compiler is latest (13.1.1) icc; latest MPSS (2-2.1.5889-14); Centos 6.4.

SCIF connection refused

For some reason the SCIF interface in my compute nodes is refusing connections. Any ideas on what's wrong or where to start investigating:

The node has a Mellanox ConnectX-3 HCA with the latest Gold Update 2 MPSS and everything else set up "by the book". All the IB services and modules load nicely and seem to work and I can ssh into the MIC and run natively.

However, if I try to run an offload (LEO or OpenCL) application it hangs. Doing an strace reveals the following:

Kernel Panic on MIC boot

After an upgrade of a node from MPSS Gold Update 1 to Update 2 I have had issues with the frontend node in our cluster crashing on boot. I tried to downgrade back to Update 1 but it still keeps happening.

We have upgraded the compute nodes succesfully. They have identical hardware and a bridged network configuration. The frontend has the default configuration in /etc/sysconfig/mic.

The host OS is CentOS 6.3 and the card model is 5110P (B1)

On the host side we get the  following error during boot:

On compilation getting rpath error

Hello,

I am writing this test code :

#include <stdio.h>

#include "offload.h"

int main()
{
char cdir[128];
int ndevices, devnum;

getcwd(cdir,sizeof(cdir));
ndevices = _Offload_number_of_devices();
devnum = _Offload_get_device_number();
printf("\n Hello...%s %d %d \n",cdir,ndevices,devnum);
return 0;
}

and compiling 

icc -o hello hello.c -loffload

compiles succesfully

However, when i am compiling as 

icc -o hello hello.c -loffload -mmic

MIC performance-single threaded

To get a better idea of MIC's single core, single threaded performance, I tried the following simple experiment:

The following is a simple, unvectorized code, where I take two vectors "arr1" and "arr2" of length=LENGTH and multiply them their corresponding elements with each other, LOOP number of times. I have kept LENGTH short enough so that both vectors fit in the L1 cache, so this shouldn't be memory bound. For ex: LOOP = 1000000 and LENGTH < 256 (should fit within L1 cache).

I compiled without using any optimization flags.

Problem with _Cilk_shared and big sized data

Dear All,

I am facing problem in using _Cilk_shared key word. It does not work for bigger arrays. The same program works well when you have small array size/data.

I am getting either the following error or an error suggesting to increase memory map area.

CARD--ERROR:1 myoiPageFaultHandler: 0x7fffff22a788 Out of Range!
CARD--ERROR:1 _myoiPageFaultHandler: 0x7fffff22a788 switch to default signal handle
CARD--ERROR:1 Segment Fault!

What can we do so that we can use _Cilk_shared with big data (larger arrays).

Thanks,

Jesmin

some figures in "Intel® Xeon PhiTM Coprocessor System Software Developers Guide" are in very low quality

As the title said, for example section "2.1.12 Host and Intel® MIC Architecture Physical Memory Map", the figure is unrecognizable, and the list below the figure is also fault in hierarchy. By the way, the overall quality of this document is not as good as Intel's 3-set software development manual. Do you have plans to fix this document? Thanks!

Out values from coprocessor garbage

#include <stdlib.h>
#include <malloc.h>

#pragma offload_attribute(push, target(mic))
#include <stdio.h>
float *h;// *t;
int bytes, x, y, z;
#pragma offload_attribute(pop)

__attribute__((target (mic))) float *t;
__declspec(target (mic)) void memTest();

__declspec(target (mic)) void memTest() {
int j;
for(j=0; j<bytes; j++)
t[j] = h[j] + 1.0;
}

int main()
{
int i;

x = y = z = 2;
bytes = x*y*z;

MKL automatic offload

I started to play with AO, using an example code dgemm_with_timing.F (attached). With MKL_MIC_ENABLE=1, OFFLOAD_REPORT=2, and matrix size M/N=4000 being large enough for AO, the code should automatically offload and provide the offload info, but I didn't see the report. Isn't OFFLOAD_REPORT=2 supposed to provide the offload profiling report level for any offload, including Intel MKL AO? Or is it possible that the code is not offloaded at all? The timing does not vary much with different MIC_OMP_NUM_THREADS I specified, so it could be. What did I miss? 

I compiled with

Pages

Subscribe to Intel® Many Integrated Core Architecture (Intel MIC Architecture)