并行计算

Poor speed in MIC

Dear All:

As learning purpose, i tried to code a program which find total number prime number for a given range. isprime function finds  if a number is prime or not. I added !$omp declare simd to vectorize that function. I do not know why, program perform three times slower in intel phi than host.

Info:

Host: 16 sec

MIC: 43 sec

MODULEFILE creation the easy way

If you use Environment Modules  (from Sourceforge, SGI, Cray, etc) to setup and control your shell environment variables, we've created a new article on how to quickly and correctly create a modulefile.  The technique is fast and produces a correct modulefile for any Intel Developer Products tool.

The article is here:  https://software.intel.com/en-us/articles/using-environment-modules-with...

 

Rebuild ofed-driver-3.6.1-1.src.rpm MPSS installation issues

Hello,

Im installing MPSS 3.6.1 on two xeon phi nodes in a cluster connected to Infiniband, CentOS 6.6 and the kernel version is 2.6.32-504.8.1.el6.x86_64, so I update the kernel-headers and kernel-devel and rebuilt the MPSS host drives as the user guide says, and so far so good, but the problem comes when I tried to rebuild OFED drivers with rpmbuild --rebuild ofed-driver-3.6.1-.1.src.rpm,  I get the following error message:

Profiling MPI applilcation with Vtune

Hi, folks

I'd like to profile my MPI application with Vtune.

In ordered to see the inter-node behaviors,I definitely need to use '-gtool' options to aggregate the profiled result into one file.

1) When I run the application without profiling, the following command works perfect:

  • $ mpiexec.hydra -genvall -n 8 -machinefile /home/my_name/machines ARGS1 ARGS2 ...

2) The following command also does the job (running multiple MPI processes on a machine). I can see the aggregated results of them.

Using InfiniBand network fabrics to allocate globally shared memory for processes on different nodes

Dear Collegues,

My MPI program implements a globally shared memory for processes on multiple nodes (hosts) using MPI_Win_allocate_shared, MPI_Comm_split_type functions calls. Unfortunately, the memory address space allocated is not actually shared between processes on different nodes. I'm wondering what will actually happen if I run my MPI program on a cluster with InfiniBand network and change the network fabrics to I_MPI_FABRICS=shm:dapl or something like that. Is this can be a solution of the following problem ?

Thanks in advance.

Cheers, Arthur.

订阅 并行计算