Using the Intel® MPI Library on Intel® Xeon Phi™ Coprocessor Systems

Download Article

Download Using_Intel_MPI_on_Intel_Xeon_Phi_Coprosessor_Systems-v1_4.pdf  [PDF 548KB]

Table of Contents

Chapter 1 – Introduction

This document is designed to help users get started writing code and running Message Passing Interface (MPI) applications (using the Intel® MPI library) on a development platform that includes the Intel® Xeon Phi™ Coprocessor.

More specifically, the Intel® MPI library used in this whitepaper is Intel MPI Library 5.0 (Update 3) for Linux* OS. This Intel MPI Library is used for both the Intel® Xeon and Intel® Xeon Phi™ Coprocessor.

1.1 – Overview

The Intel MPI Library for Linux OS is a multi-fabric message passing library based on ANL* MPICH2* and OSU* MVAPICH2*.

The Intel MPI Library for Linux OS implements the Message Passing Interface, version 3.0 (MPI-3) specifications.

It currently supports the Intel® C++ Compiler for Linux OS version 15.0 and higher and the Intel® Fortran Compiler for Linux OS version 15.0 and higher. Users can write their code in C, C++, Fortran 77 and Fortran 90.

1.2 – Compatibility

The Intel MPI Library for Linux OS supports a good variety of Operating Systems, including the following distributions:

  • Red Hat* Enterprise Linux 64-bit 6.0 kernel 2.6.32-71
  • Red Hat Enterprise Linux 64-bit 6.1 kernel 2.6.32-131
  • Red Hat Enterprise Linux 64-bit 6.2 kernel 2.6.32-220
  • Red Hat Enterprise Linux 64-bit 6.3 kernel 2.6.32-279
  • Red Hat Enterprise Linux 64-bit 6.4 kernel 2.6.32-358
  • Red Hat Enterprise Linux 64-bit 6.5 kernel 2.6.32-431
  • Red Hat Enterprise Linux 64-bit 7.0 kernel 3.10.0-123
  • Red Hat Enterprise Linux 64-bit 7.1 kernel 3.10.0-229
  • SUSE* Linux Enterprise Server 11 SP1 kernel 2.6.32.12-0.7-default
  • SUSE Linux Enterprise Server 11 SP2 kernel 3.0.13-0.27-default
  • SUSE Linux Enterprise Server 11 SP3 kernel 3.0.76-0.11-default
  • SUSE Linux Enterprise Server 12 kernel 3.12.28-4-default

Note that the Intel MPI Library 5.0 (Update 3) for Linux OS supports multiple Intel® Xeon Phi™ coprocessors.

The first part of this whitepaper shows how to install the Intel MPI Library 5.0 (Update 3) on MPSS 3.4.2 . The second part shows how to run some MPI sample code on the Intel® Xeon Phi™ Coprocessor.

Chapter 2 – Installing the Intel® MPI Library

2.1 – Installing the Intel MPI Library

To start, you must follow appropriate directions to install the latest versions of the Intel C/C++ Compiler and the Intel Fortran Compiler. In this paper, the version 2015 is used.

You can purchase these Software Development Tools from http://software.intel.com/en-us/linux-tool-suites. These instructions assume that you have the Intel MPI Library tar file - l_mpi_p_5.0.3.048.tgz. This is the latest stable release of the library at the time of writing this article.  To check if a newer version exists, log into the Intel® Registration Center.  The below instructions are valid for all current and subsequent releases.

Untar the tar file l_mpi_p_5.0.3.048.tgz:

# tar -xzvf l_mpi_p_5.0.3.048.tgz
# cd l_mpi_p_5.0.3.048
# ls
cd_eject.sh   
INSTALL.html  
Install_GUI.sh
install.sh 
license.txt   
pset 
PUBLIC_KEY.PUB         
Release_Notes.txt
rpm
silent.cfg
sshconnectivity.exp
third-party-programs.txt

Run the install.sh script and follow the instructions. The installation will be placed, for a specific user, into the installation directory $HOME/intel/impi/5.0.3.048. For the root user, it will be installed into the /opt/intel/impi/5.0.3.048 directory assuming you are installing the library with root permission.

# sudo ./install.sh
# ls -l /opt/intel/impi/5.0.3.048/
total 172
lrwxrwxrwx. 1 root root    11 Apr 27 12:14 bin64 -> intel64/bin
drwxr-xr-x. 2 root root  4096 Apr 27 12:14 binding
drwxr-xr-x. 5 root root  4096 Apr 27 12:14 doc
-rw-r--r--. 1 root root  9960 Jan 30 02:30 Doc_Index.html
lrwxrwxrwx. 1 root root    11 Apr 27 12:14 etc64 -> intel64/etc
lrwxrwxrwx. 1 root root    15 Apr 27 12:14 include64 -> intel64/include
drwxr-xr-x. 6 root root  4096 Apr 27 12:14 intel64
lrwxrwxrwx. 1 root root    11 Apr 27 12:14 lib64 -> intel64/lib
drwxr-xr-x. 3 root root  4096 Apr 27 12:14 man
drwxr-xr-x. 6 root root  4096 Apr 27 12:14 mic
-rw-r--r--. 1 root root 35803 Jan 30 02:31 mpiEULA.txt
-rw-r--r--. 1 root root 35803 Jan 30 02:31 mpi-rtEULA.txt
-rw-r--r--. 1 root root   491 Jan 30 02:31 mpi-rtsupport.txt
-rw-r--r--. 1 root root   283 Jan 30 02:31 mpisupport.txt
-rw-r--r--. 1 root root 10180 Jan 30 02:30 README.txt
-rw-r--r--. 1 root root  2800 Jan 30 02:31 redist-rt.txt
-rw-r--r--. 1 root root  1553 Jan 30 02:31 redist.txt
drwxr-xr-x. 2 root root  4096 Apr 27 12:14 test
drwxr-xr-x. 3 root root  4096 Apr 27 12:08 testOLD
-rw-r--r--. 1 root root   217 Jan 30 02:31 third-party-programs.txt
drwxr-xr-x. 6 root root  4096 Apr 27 12:14 uninstall
-rwxr-xr-x. 1 root root   182 Jan 30 02:35 uninstall_GUI.sh
-rwxr-xr-x. 1 root root 19559 Jan 30 02:35 uninstall.sh

2.2 – Preparation

Before the first run of an MPI application on the Intel® Xeon Phi™ coprocessors, we need to copy the MPI libraries to the following directories on each coprocessor equipped on this system. Another approach is to NFS mount the coprocessors’ file system from the host so that the coprocessors can have access to their MPI libraries from there. One advantage of using NFS mounts is saving RAM space on the coprocessors. The details on how to setup NFS mounts can be found in the MPSS User’s Guide.  

In this example, we issue the copy to two coprocessors:  the first coprocessor is accessible via the IP address 172.31.1.1 and the second coprocessor has 172.31.2.1 as its IP address. Note that all coprocessors have unique IP addresses since they are treated as just other uniquely addressable machines. You can refer to the first coprocessor as mic0 or its IP address; similarly, you can refer to the second coprocessor as mic1 or its IP address.

# sudo scp /opt/intel/impi/5.0.3.048/mic/bin/* mic0:/bin/
cpuinfo                                       100%   67KB  66.9KB/s   00:00
hydra_nameserver                              100%  617KB 616.9KB/s   00:00
IMB-MPI1                                      100%  153KB 153.1KB/s   00:00
IMB-NBC                                       100%  178KB 177.8KB/s   00:00
IMB-RMA                                       100%  153KB 153.3KB/s   00:00
mpicc                                         100% 2965     2.9KB/s   00:00
mpicleanup                                    100%   12KB  12.0KB/s   00:00
mpicxx                                        100% 3009     2.9KB/s   00:00
...
# sudo scp /opt/intel/impi/5.0.3.048/mic/lib/* mic0:/lib64/
/opt/intel/impi/5.0.3.048/mic/lib/debug: not a regular file
/opt/intel/impi/5.0.3.048/mic/lib/debug_mt: not a regular file
libmpi.a                                      100%   60MB   6.7MB/s   00:09
libmpicxx.a                                   100%  532KB 532.2KB/s   00:00
libmpicxx.so                                  100%  324KB 323.9KB/s   00:00
libmpicxx.so.12                               100%  324KB 323.9KB/s   00:00
libmpicxx.so.12.0                             100%  324KB 323.9KB/s   00:00
libmpi.dbg                                    100%   21MB   6.8MB/s   00:03
libmpi_dbg.a                                  100%   64MB   6.4MB/s   00:10
libmpi_dbg_mt.a                               100%   64MB   6.4MB/s   00:10
libmpi_dbg_mt.so                              100% 9298KB   9.1MB/s   00:01
...
# sudo scp /opt/intel/composer_xe_2015.2.164/compiler/lib/mic/* mic0:/lib64/
for_main.o                                    100% 1696     1.7KB/s   00:00
init.o                                        100% 1584     1.6KB/s   00:00
libbfp754.a                                   100%  235KB 234.5KB/s   00:00
libchkp.so                                    100%   35KB  34.6KB/s   00:00
libchkpwrap.a                                 100%  113KB 112.7KB/s   00:00
libchkpwrap_h.a                               100%  113KB 112.8KB/s   00:00
...
# sudo scp /opt/intel/impi/5.0.3.048/mic/bin/* mic1:/bin/
cpuinfo                                       100%   67KB  66.9KB/s   00:00
hydra_nameserver                              100%  617KB 616.9KB/s   00:00
IMB-MPI1                                      100%  153KB 153.1KB/s   00:00
IMB-NBC                                       100%  178KB 177.8KB/s   00:00
IMB-RMA                                       100%  153KB 153.3KB/s   00:00
mpicc                                         100% 2965     2.9KB/s   00:00
mpicleanup                                    100%   12KB  12.0KB/s   00:00
mpicxx                                        100% 3009     2.9KB/s   00:00
...
# sudo scp /opt/intel/impi/5.0.3.048/mic/lib/* mic1:/lib64/
/opt/intel/impi/5.0.3.048/mic/lib/debug: not a regular file
/opt/intel/impi/5.0.3.048/mic/lib/debug_mt: not a regular file
libmpi.a                                      100%   60MB   6.7MB/s   00:09
libmpicxx.a                                   100%  532KB 532.2KB/s   00:00
libmpicxx.so                                  100%  324KB 323.9KB/s   00:00
libmpicxx.so.12                               100%  324KB 323.9KB/s   00:00
libmpicxx.so.12.0                             100%  324KB 323.9KB/s   00:00
libmpi.dbg                                    100%   21MB   6.8MB/s   00:03
libmpi_dbg.a                                  100%   64MB   6.4MB/s   00:10
libmpi_dbg_mt.a                               100%   64MB   6.4MB/s   00:10
libmpi_dbg_mt.so                              100% 9298KB   9.1MB/s   00:01
...
# sudo scp /opt/intel/composer_xe_2015.2.164/compiler/lib/mic/* mic1:/lib64/
for_main.o                                    100% 1696     1.7KB/s   00:00
init.o                                        100% 1584     1.6KB/s   00:00
libbfp754.a                                   100%  235KB 234.5KB/s   00:00
libchkp.so                                    100%   35KB  34.6KB/s   00:00
libchkpwrap.a                                 100%  113KB 112.7KB/s   00:00
libchkpwrap_h.a                               100%  113KB 112.8KB/s   00:00
...

Instead of copying the MPI libraries manually, you can also run the script below:

#!/bin/sh

export COPROCESSORS="mic0 mic1"
export BINDIR="/opt/intel/impi/5.0.3.048/mic/bin"
export LIBDIR="/opt/intel/impi/5.0.3.048/mic/lib"
export COMPILERLIB="/opt/intel/composer_xe_2015.2.164/compiler/lib/mic"

for coprocessor in `echo $COPROCESSORS`
do
   for prog in mpiexec mpiexec.hydra pmi_proxy mpirun
   do
      sudo scp $BINDIR/$prog $coprocessor:/bin/$prog
   done

   for lib in libmpi.so.12 libmpifort.so.12 libmpicxx.so.12
   do
      sudo scp $LIBDIR/$lib $coprocessor:/lib64/$lib
   done

   for lib in libimf.so libsvml.so libintlc.so.5
   do
      sudo scp $COMPILERLIB/$lib $coprocessor:/lib64/$lib
   done
done

For multi-card usage, configure MPSS peer-to-peer:

# sudo /sbin/sysctl -w net.ipv4.ip_forward=1

Chapter 3 – Compiling and Running the Sample MPI Program

This section includes a sample MPI program written in C. We will show how to compile and run the program for the host and also for the Intel® Xeon Phi™ Coprocessor.

Intel® MPI Library supports three programming models:

  • Co-processor only model: in this native mode, the MPI ranks reside solely inside the coprocessor. The application can be launched from the host or the coprocessor.
  • Symmetric model: in this mode, the MPI ranks reside on the host and the coprocessors.
  • MPI Offload model: in this mode, the MPI ranks reside solely on the host. The MPI ranks use offload capabilities of the Intel® C/C++ Compiler or Intel® Fortran Compiler to offload some workloads to the coprocessors.

For illustration purposes, the following example shows how to build and run an MPI application in symmetric model.

The sample program estimates the calculation of Pi()using a Monte Carlo method. Consider a sphere centered at the origin and circumscribed by a cube: the sphere’s radius is r and the cube edge length is 2r. The volumes of a sphere and a cube are given by

The first octant of the coordinate system contains one eighth of the volumes of both the sphere and the cube; the volumes in that octant are given by:

If we generate Nc points uniformly and randomly in the cube within this octant, we expect that about Ns points will be inside the volume of sphere according to the following ratio:

Therefore, the estimated Pi ()is calculated by

where Nc is the number of points generated in the portion of the cube residing in the first octant, and Ns is the total number of points found inside the portion of the sphere residing in the first octant.

In the implementation, rank 0 (process) is responsible for dividing the work among the other n ranks. Each rank is assigned a chunk of work, and the summation is used to estimate the number Pi. Rank 0 divides the x-axis into n equal segments. Each rank generates (NC /n) points in the assigned segment, and then computes the number of points in the first octant of the sphere.



Figure 1 – Each rank handles a separate slide in the first octant.

The pseudo code is shown below:

Rank 0 generate n random seed
Rank 0 broadcast all random seeds to n rank
For each rank i [0, n-1]
	receive the corresponding seed
	set num_inside = 0
	For j=0 to Nc / n
		generate a point with coordinates 
			x between [i/n, (i+1)/n] 
			y between [0, 1]
			z between [0, 1] 
		compute the distance d = x^2 + y^2 + z^2
		if distance d <= 1, increment num_inside
	Send num_inside back to rank 0
Rank 0 set Ns  to the sum of all num_inside
Rank 0 compute Pi = 6 * Ns  / Nc

Before compiling the program, called montecarlo.c, you need to establish the proper environment settings for the compiler and for the Intel® MPI Library for Intel® Xeon Phi™ Coprocessor.

# source /opt/intel/composer_xe_2015.2.164/bin/compilervars.sh intel64
# source /opt/intel/impi/5.0.3.048/bin64/mpivars.sh

Build the application montecarlo.mic for the coprocessor:

# mpiicc –mmic montecarlo.c -o montecarlo.mic

Build the application for the host:

# mpiicc montecarlo.c -o montecarlo.host

Upload the application montecarlo.mic to the /tmp directory on the coprocessors using the scp command. In this example, we issue the copy to two coprocessors.

# sudo scp ./montecarlo.mic mic0:/tmp/montecarlo.mic
montecarlo.mic                                100%   16KB  15.6KB/s   00:00
# sudo scp ./montecarlo.mic mic1:/tmp/montecarlo.mic
montecarlo.mic                                100%   16KB  15.6KB/s   00:00

Enable the MPI communication between host and coprocessors:

# export I_MPI_MIC=enable

The command mpirun starts the application. Also, the flag –n specifies the number of MPI processes and the flag –host specifies the machine name:

# mpirun –n <# of processes> -host <hostname> <application>

We can run the application on multiple hosts by separating them by “:”. The first MPI rank (rank 0) always starts on the first part of the command:

# mpirun –n <# of processes> -host <hostname1> <application> : –n <# of processes> -host <hostname2> <application>

This should start the rank 0 on hostname1.

Now run the application on the host. The mpirun command shown below starts the application with 2 ranks on the host, 3 ranks on the coprocessor MIC0 and 5 ranks on coprocessor MIC1:

# mpirun -n 2 -host knightscorner0 ./montecarlo.host 
: -n 3 -host mic0 /tmp/montecarlo.mic 
: -n 5 -host mic1 /tmp/montecarlo.mic
Hello world: rank 0 of 10 running on knightscorner0
Hello world: rank 1 of 10 running on knightscorner0
Hello world: rank 2 of 10 running on knightscorner0-mic0
Hello world: rank 3 of 10 running on knightscorner0-mic0
Hello world: rank 4 of 10 running on knightscorner0-mic0
Hello world: rank 5 of 10 running on knightscorner0-mic1
Hello world: rank 6 of 10 running on knightscorner0-mic1
Hello world: rank 7 of 10 running on knightscorner0-mic1
Hello world: rank 8 of 10 running on knightscorner0-mic1
Hello world: rank 9 of 10 running on knightscorner0-mic1
Elapsed time from rank 0:      13.91 (sec)
Elapsed time from rank 1:      14.01 (sec)
Elapsed time from rank 2:     193.42 (sec)
Elapsed time from rank 3:     193.48 (sec)
Elapsed time from rank 4:     193.70 (sec)
Elapsed time from rank 5:     193.36 (sec)
Elapsed time from rank 6:     193.27 (sec)
Elapsed time from rank 7:     192.97 (sec)
Elapsed time from rank 8:     192.47 (sec)
Elapsed time from rank 9:     191.95 (sec)
Out of 4294967295 points, there are 2248803344 points inside the sphere => pi=  3.141541957855

A shorthand way of doing this in symmetric mode will be to use the –machinefile option for the mpirun command in coordination with the I_MPI_MIC_POSTFIX environment variable. In this case, make sure all executables are in the same location on the host and MIC0 and MIC1 cards.

The I_MPI_MIC_POSTFIX environment variable simply tells the library to add the .mic postfix when running on the cards (since the executables there are called montecarlo.mic).

# export I_MPI_MIC_POSTFIX=.mic

Now set the rank mapping in your hosts file (by using the <host>:<#_ranks> format):

# cat hosts_file
knightscorner0:2
mic0:3
mic1:5

And run your executable:

# cp ./montecarlo.host /tmp/montecarlo
# mpirun -machinefile hosts_file /tmp/montecarlo

The nice thing about this syntax is that you only have to edit the hosts_file when deciding to change your number of ranks, or when you need to add more cards.

From the host, you can alternately launch the application running only on the coprocessors mic0 and mic1:

# mpirun -n 3 -host mic0 /tmp/montecarlo.mic : -n 5 -host mic1  /tmp/montecarlo.mic
Hello world: rank 0 of 8 running on knightscorner0-mic0
Hello world: rank 1 of 8 running on knightscorner0-mic0
Hello world: rank 2 of 8 running on knightscorner0-mic0
Hello world: rank 3 of 8 running on knightscorner0-mic1
Hello world: rank 4 of 8 running on knightscorner0-mic1
Hello world: rank 5 of 8 running on knightscorner0-mic1
Hello world: rank 6 of 8 running on knightscorner0-mic1
Hello world: rank 7 of 8 running on knightscorner0-mic1
Elapsed time from rank 0:     255.25 (sec)
Elapsed time from rank 1:     241.74 (sec)
Elapsed time from rank 2:     245.78 (sec)
Elapsed time from rank 3:     241.82 (sec)
Elapsed time from rank 4:     256.23 (sec)
Elapsed time from rank 5:     241.42 (sec)
Elapsed time from rank 6:     240.81 (sec)
Elapsed time from rank 7:     240.10 (sec)
Out of 4294967295 points, there are 2248825514 points inside the sphere => pi=  3.14157295227

As an alternative, you can ssh to the coprocessor mic0 and launch the application from there:

# ssh mic0
# mpirun -n 3 /tmp/montecarlo.mic 
Hello world: rank 0 of 3 running on knightscorner0-mic0
Hello world: rank 1 of 3 running on knightscorner0-mic0
Hello world: rank 2 of 3 running on knightscorner0-mic0
Elapsed time from rank 0:     644.62 (sec)
Elapsed time from rank 1:     659.67 (sec)
Elapsed time from rank 2:     642.24 (sec)
Out of 4294967295 points, there are 2248853797 points inside the sphere => pi=  3.141612529755

This section showed how to compile and run a simple MPI application in symmetric model. In a heterogeneous computing system, the performance in each computational unit is different and this system behavior leads to the load imbalance problem. The Intel® Trace Analyzer and Collector (or ITAC) can be used to analyze and understand the behavior of a complex MPI program running on a heterogeneous system. Using ITAC, users can quickly identify bottlenecks, evaluate load balancing, analyze performance, and identify communication hotspots. This powerful tool is essential to debug and improve the performance of a MPI program running on a cluster with multiple computational units.  For more details on using ITAC, users are encouraged to read the whitepaper “Understanding MPI Load Imbalance with Intel® Trace Analyzer and Collector” available on http://software.intel.com/mic-developer . For more details, tips and tricks, and known workarounds, visit our Intel® Cluster Tools and the Intel® Xeon Phi™ Coprocessors page.

Appendix

The code of the sample MPI program is shown below:

/*
// Copyright 2003-2012 Intel Corporation. All Rights Reserved.
// 
// The source code contained or described herein and all documents related 
// to the source code ("Material") are owned by Intel Corporation or its
// suppliers or licensors.  Title to the Material remains with Intel Corporation
// or its suppliers and licensors.  The Material is protected by worldwide
// copyright and trade secret laws and treaty provisions.  No part of the
// Material may be used, copied, reproduced, modified, published, uploaded,
// posted, transmitted, distributed, or disclosed in any way without Intel's
// prior express written permission.
// 
// No license under any patent, copyright, trade secret or other intellectual
// property right is granted to or conferred upon you by disclosure or delivery
// of the Materials, either expressly, by implication, inducement, estoppel
// or otherwise.  Any license under such intellectual property rights must
// be express and approved by Intel in writing.


#******************************************************************************
# Content: (version 0.5)
#      Based on a Monto Carlo method, this MPI sample code uses volumes to
#      estimate the number PI.
#      
#*****************************************************************************/
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <time.h>
#include <math.h>

#include "mpi.h"

#define MASTER 0
#define TAG_HELLO 4
#define TAG_TEST 5
#define TAG_TIME 6

int main(int argc, char *argv[])
{
  int i, id, remote_id, num_procs;
   
  MPI_Status stat;
  int namelen;
  char name[MPI_MAX_PROCESSOR_NAME];

  // Start MPI.
  if (MPI_Init (&argc, &argv) != MPI_SUCCESS)
    {
      printf ("Failed to initialize MPIn");
      return (-1);
    }
  // Create the communicator, and retrieve the number of processes.
  MPI_Comm_size (MPI_COMM_WORLD, &num_procs);

  // Determine the rank of the process.
  MPI_Comm_rank (MPI_COMM_WORLD, &id);

  // Get machine name
  MPI_Get_processor_name (name, &namelen);
  
  if (id == MASTER)
    {
      printf ("Hello world: rank %d of %d running on %sn", id, num_procs, name);

      for (i = 1; i<num_procs; i++) 
	{	
	  MPI_Recv (&remote_id, 1, MPI_INT, i, TAG_HELLO, MPI_COMM_WORLD, &stat);	
	  MPI_Recv (&num_procs, 1, MPI_INT, i, TAG_HELLO, MPI_COMM_WORLD, &stat);  		
	  MPI_Recv (&namelen, 1, MPI_INT, i, TAG_HELLO, MPI_COMM_WORLD, &stat);			
	  MPI_Recv (name, namelen+1, MPI_CHAR, i, TAG_HELLO, MPI_COMM_WORLD, &stat);
			
	  printf ("Hello world: rank %d of %d running on %sn", remote_id, num_procs, name);
	}
}
  else   
    {	    
      MPI_Send (&id, 1, MPI_INT, MASTER, TAG_HELLO, MPI_COMM_WORLD);
      MPI_Send (&num_procs, 1, MPI_INT, MASTER, TAG_HELLO, MPI_COMM_WORLD);
      MPI_Send (&namelen, 1, MPI_INT, MASTER, TAG_HELLO, MPI_COMM_WORLD);
      MPI_Send (name, namelen+1, MPI_CHAR, MASTER, TAG_HELLO, MPI_COMM_WORLD);
}

   // Rank 0 distributes seek randomly to all processes.
  double startprocess, endprocess;

  int distributed_seed = 0;
  int *buff;

  buff = (int *)malloc(num_procs * sizeof(int));
	
  unsigned int MAX_NUM_POINTS = pow (2,32) - 1;
  unsigned int num_local_points = MAX_NUM_POINTS / num_procs;

  if (id == MASTER)
    {		  
      srand (time(NULL));
  
      for (i=0; i<num_procs; i++)    
	{           
	  distributed_seed = rand();
	  buff[i] = distributed_seed;
	}
    }
  // Broadcast the seed to all processes
  MPI_Bcast(buff, num_procs, MPI_INT, MASTER, MPI_COMM_WORLD);

  // At this point, every process (including rank 0) has a different seed. Using their seed,
  // each process generates N points randomly in the interval [1/n, 1, 1]
  startprocess = MPI_Wtime();

  srand (buff[id]);

  unsigned int point = 0;
  unsigned int rand_MAX = 128000;
  float p_x, p_y, p_z;
  float temp, temp2, pi;
  double result;
  unsigned int inside = 0, total_inside = 0;

  for (point=0; point<num_local_points; point++)
    {
      temp = (rand() % (rand_MAX+1));
      p_x = temp / rand_MAX;
      p_x = p_x / num_procs;
      
      temp2 = (float)id / num_procs;	// id belongs to 0, num_procs-1
      p_x += temp2;
      
      temp = (rand() % (rand_MAX+1));
      p_y = temp / rand_MAX;
      
      temp = (rand() % (rand_MAX+1));
      p_z = temp / rand_MAX;

      // Compute the number of points residing inside of the 1/8 of the sphere
      result = p_x * p_x + p_y * p_y + p_z * p_z;

      if (result <= 1)
	  {
		inside++;
	  }
    }

  double elapsed = MPI_Wtime() - startprocess;

  MPI_Reduce (&inside, &total_inside, 1, MPI_UNSIGNED, MPI_SUM, MASTER, MPI_COMM_WORLD);


#if DEBUG 
  printf ("rank %d counts %u points inside the spheren", id, inside);
#endif
  if (id == MASTER)
    {
      double timeprocess[num_procs];

      timeprocess[MASTER] = elapsed;
      printf("Elapsed time from rank %d: %10.2f (sec) n", MASTER, timeprocess[MASTER]);
      for (i=1; i<num_procs; i++)
	{
	  // Rank 0 waits for elapsed time value 
	  MPI_Recv (&timeprocess[i], 1, MPI_DOUBLE, i, TAG_TIME, MPI_COMM_WORLD, &stat); 
	  printf("Elapsed time from rank %d: %10.2f (sec) n", i, timeprocess[i]);
	}

      temp = 6 * (float)total_inside;
      pi = temp / MAX_NUM_POINTS;   
      printf ( "Out of %u points, there are %u points inside the sphere => pi=%16.12fn", MAX_NUM_POINTS, total_inside, pi);
    }
  else
    {
      // Send back the processing time (in second)
      MPI_Send (&elapsed, 1, MPI_DOUBLE, MASTER, TAG_TIME, MPI_COMM_WORLD);
    }

  free(buff);

  // Terminate MPI.
  MPI_Finalize();
  
  return 0;
}

About the Author

Loc Q Nguyen received an MBA from University of Dallas, a master’s degree in Electrical Engineering from McGill University, and a bachelor's degree in Electrical Engineering from École Polytechnique de Montréal. He is currently a software engineer with Intel Corporation's Software and Services Group. His areas of interest include computer networking, computer graphics, and parallel processing.

For more complete information about compiler optimizations, see our Optimization Notice.

Comments

I followed the same steps but

I followed the same steps but gave following error please help

 

[kiran@compute012 intel_mic]$ mpirun -n 2 ./hello_native_host : -n 3 -host mic0 /tmp/hello_native_mic
[proxy:0:0@compute012-mic0] HYDU_create_process (./utils/launch/launch.c:556): execvp error on file ./hello_native_host (No such file or directory)
[proxy:0:0@compute012-mic0] HYDU_create_process (./utils/launch/launch.c:556): execvp error on file ./hello_native_host (No such file or directory)
/tmp/hello_native_mic: error while loading shared libraries: libmpigf.so.4: wrong ELF class: ELFCLASS32
/tmp/hello_native_mic: error while loading shared libraries: libmpigf.so.4: wrong ELF class: ELFCLASS32
/tmp/hello_native_mic: error while loading shared libraries: libmpigf.so.4: wrong ELF class: ELFCLASS32

 


Thanks a million for the

Thanks a million for the quick answer! We are kind of desperate for RH 6.4 support in order to test the MICs in a cluster environment. Now we know we can install Intel MPI 4.1 and try it.

In that context: Do you also happen to know with which ofed versions - preferably the most recent ones - the current MPSS works? There seem to be compatibility issues between several software components which are difficult to handle. (I hope we don't overstress you with these questions, but they may be relevant for others, too.)




Dr. Loc Q Nguyen,

Dr. Loc Q Nguyen,
Thanks a lot for the detailed step-by-step instruction!

The keeper of the city keys
Put shutters on the dreams.
I wait outside the pilgrim's door
With insufficient schemes.
The black queen chants
The funeral march,
The cracked brass bells will ring;
To summon back the fire witch
To the court of the crimson king.







Pages