Intel® Advisor on Cray* Systems

Introduction

Intel® Advisor provides two analysis workflows to ensure that C, C++, and Fortran applications make the most of today's processor architectures: a Vectorization Workflow and a Threading Workflow.

Vectorization Workflow: This workflow lets you identify loops that will benefit most from vectorization, identify issues preventing vectorization, estimate the benefit of alternative data reorganizations, and analyze instruction mix in the application.

Threading Workflow: This workflow is designed to analyze, design, tune, and check threading design options without disrupting your normal development.

This article shows how to collect and analyze data in Cray systems, but does not go into details of all the available capabilities. For a detailed description of those, please check the Intel® Advisor Documentation Site.

Build the application binary

Make sure the Intel® programming environment is loaded by default:

$ module list

If the Cray programming environment is loaded instead, proceed to swap them:

$ module swap PrgEnv-cray PrgEnv-intel

If you use the Intel® compilers directly you can build your application in the usual manner, with full optimization enabled. For example, for a Fortran code using MPI and OpenMP*:

$ mpiifort –g –O3 –qopenmp source.F90 –o app

If you use the Cray ftn of cc compiler wrappers instead you must add the –dynamic flag, since these build applications statically by default - notice ftn automatically detects OpenMP* pragmas so no OpenMP* flag is required:

$ ftn –g –dynamic –O3 source.F90 –o app

Collecting data in Interactive mode

Different systems provide different ways of running jobs interactively, so you should check how that is achieved in the particular system you are working on. It is common for modern Cray* systems to use the Slurm scheduler, and to enable interactive sessions via the salloc command. As an example, to launch a 30 minute session on a single node, in a partition (queue) named debug, you would execute the following command:

$ salloc –N 1 –t 30:00 –p debug

Once the allocation is granted the system should send the user to the appropriate node and the prompt change to reflect the new node name. Once in the allocated node, the environment must be set properly:

$ module swap PrgEnv-cray PrgEnv-intel
$ module load advisor

There is, at current time, no homogeneous naming scheme for the Intel® Advisor Cray* module, so please check your system documentation for the correct name. We will use advisor as a convenient placeholder for the Cray module since several centers have adopted it.

Serial or threaded code without MPI

Simply use the regular syntax to collect data:

$ advixe-cl [options] -- <application> [arguments]

For example, to perform a survey collection for an OpenMP* enabled code you could run the following commands:

$ export OMP_NUM_THREADS=16
$ advixe-cl –collect survey --project-dir ./results_dir -- ./app

Once the run completes you should exit and analyze the results using the GUI or a command line report. You may also analyze the results in a different system.

Using a single MPI rank

In this case we must simply prepend the appropriate MPI launcher to the collection line:

$ <mpi_launcher> [options] advixe-cl [options] -- <application> [arguments]

Continuing with our assumption of a modern Cray* configuration with Slurm, a hybrid application using both MPI and OpenMP* could be launched in the following way:

$ export OMP_NUM_THREADS=16
$ srun –n 1 –c 16 advixe-cl –collect survey --project-dir ./results_dir -- ./app

And just as in the non-MPI case, you would exit this node once the run is complete, and analyze the results using the GUI or a command line report. You may also analyze the results in a different system.

Using multiple MPI ranks

It is recommended that when running a code on multiple MPI ranks only one of them is profiled using Intel® Advisor. This section describes how to achieve this when using the Cray-MPICH MPI library and runtime.

You will be running in multiple-program-multiple-data (MPMD) mode. Start by creating a multi-program configuration file. Let’s call this file config.txt. Assuming you will use 4 MPI ranks, the file contents could look like this:

$ 0 advixe-cl –collect survey --project-dir ./results_dir -- ./app
$ 1-3 ./app

Then simply modify your workflow to launch srun with the file as the argument:

$ export OMP_NUM_THREADS=4
$ srun –multi-prog ./config.txt

And just as in the non-MPI case, you would exit this node once the run is complete, and analyze the results using the GUI or a command line report. You may also analyze the results in a different system.

For additional information on how to run Intel® Advisor with MPI programs check out the introductory article Using Intel® Advisor and VTune™ Amplifier with MPI or the more in depth Analyzing Intel® MPI applications using Intel® Advisor.

Collecting data in batch mode

Sometimes it is convenient to simply submit a batch job to collect the performance data. In that case all the information above is still valid, and the only change that needs to be made is that a submission script is created with all configuration details and the job submitted to the scheduler. Let’s call our submission script run.slurm. An example of its contents – and this may vary depending on the specific site – would be:

#!/bin/bash
#SBATCH --job-name=run_name
#SBATCH -N 1
#SBATCH -p debug
#SBATCH --ntasks-per-node=1
#SBATCH --time=00:30:00
cd $PBS_O_WORKDIR
export OMP_NUM_THREADS=16
module swap PrgEnv-cray PrgEnv-intel
module load advisor
srun –n 1 –c 16  advixe-cl –collect survey --project-dir ./results_dir -- ./app

This would be submitted from the login node with the sbatch command:

$ sbatch ./run.slurm

And again, upon completion you can analyze the results from a login node using the GUI or a command line report, or move the results to analyze them in a local workstation.

Analyzing results using the GUI

Follow your data center advice regarding profiling best practices. Typically data collections and analysis should be performed in a high performance parallel file system such as Lustre or GPFS. Some centers do not allow program execution on the login nodes – check which rules apply to your system.

Once data collection is completed, you should exit your interactive session if applicable, and use the GUI to load the results from a login or post-processing node (this will require an X11 windows manager in your local system):

$ module swap PrgEnv-cray PrgEnv-intel
$ module load advisor
$ advixe-gui ./results_dir

Another option is to pack the collected data, transfer it to a local system using scp or similar, and use the Linux*, Windows* or Mac OS* clients to analyze the data locally. For data collections of moderate size this provides the most responsive and convenient manner of looking at the results.

Analyzing results with a command line report

While Intel Advisor offers an advanced GUI that helps organize and interpret the profiling results it is possible to obtain thorough command line text reports. For example, to obtain a summary report we would simply execute:

$ module swap PrgEnv-cray PrgEnv-intel
​$ module load advisor
$ advixe-cl –report summary –-project-dir ./results_dir –format text

This would print the summary to screen. Options to save the output directly to file and in other formats (csv, xml) are available among many others. For details on the full command line options type advixe-cl -help or consult the  Intel® Advisor Documentation Site.

Copying results to a local system

Results may be transferred to a local Linux*, Windows*, or Mac OS* system for further analysis. There are two main ways of achieving this: generating a read-only snapshot; and manually copying the necessary data.

Collections may record large amounts of data. Before starting the packaging process you should check the size of the results directory with du -h to see how much data you will be handling.

Generating a read-only snapshot

This option ensures all necessary data is packaged for further investigation. Following our example above, the command we would use to pack and compress the results together with the necessary source and binary files would be:

$ advixe-cl --snapshot --project-dir ./adv --pack --cache-sources --cache-binaries -- ./adv_snapshot

This will generate a file named adv_snapshot.advixeexpz that can then be copied over using secure copy (scp) or a similar method to a local system for analysis with Intel® Advisor by double-clicking on the file itself or using the "File - Open Result" option in the GUI.

You can also generate a snapshot from GUI by using the "File - Create Data Snapshot" menu option, or by clicking the camera icon on the tool bar.

Copying results manually

If you opt for moving the results manually you will need to also copy the application source code and binary in order to have full functionality. You can simply create a subdirectory inside the results project directory and copy the files there:

$ mkdir ./results_dir/src
$ cp -r <application_source_dir>/* ./results_dir/src
$ cp ./app ./results_dir/src

Then package this directory using your favorite tool (tar, zip, etc ... ), transfer it to your local system using secure copy (scp) or similar, and unpack it.

Once you open the results file you will have to add the location of the source code and the binary files under "File - Project Properties" in order to see source and assembly listings within the Intel® Advisor GUI.

Known issues

Slow finalization

In some architectures the finalization step that Intel® Advisor executes after the performance data collection can be very time-consuming. The flag -no-auto-finalize will avoid the automatic execution of the finalization stage after collection. This allows you to run the finalization automatically when you open the results in local workstation or a dedicated post-processing node.

MPI applications on Cray* systems

Basic setup

When using Cray* MPICH  it will be necessary to set the PMI_NO_PROXY variable in the environment for Intel® Advisor to complete its collection and analysis correctly:

$ export PMI_NO_FORK=1

Network timeouts

Software updates to Cray* MPICH may change timeout values which are relevant to running Intel® Advisor. If timeouts are observed, it is recommended to increase the default values of the variables PMI_MMAP_SYNC_WAIT_TIME and PMI_CONNECT_RETRIES. Testing shows the following values are safe in most cases:

$ export PMI_MMAP_SYNC_WAIT_TIME=1800
​$ export PMI_CONNECT_RETRIES=1000

*Other names and brands may be claimed as the property of others.

 

 

Para obtener información más completa sobre las optimizaciones del compilador, consulte nuestro Aviso de optimización.