Intel® Parallel Studio XE 2018 Professional Edition for Linux*

Published:04/14/2015   Last Updated:09/12/2017

Product tour with videos and samples

Learn when and how to use the Intel Parallel Studio XE components in a typical software development workflow. You can apply the principles learned to your own application:

  1. Identify the performance hotspots in your application using the Intel® VTune™ Amplifier 
  2. Leverage performance libraries to speed up the hotspots
  3. Compile and optimize using the Intel® C++ Compiler,  and link the optimized binary into your application
  4. Check for issues using Intel® Inspector.

Sample Applications

Some of the sample applications referred to by this article are available in your default installation directory, while others can be downloaded. To download the samples:

  1. Click on the download link below.
  2. Download and untar the file to a local folder.
  3. Open the sample folder.
  4. Build the sample using the provided Makefile: 
    $ make
  1. Run the application by running the application specific name or a.out as appropriate for the application:
    $ ./a.out

Identify performance hotspots

As a first step, use the Intel® VTune™ Amplifier to identify the functions, loops, and files that have the biggest impact on your application’s performance.

The following video and sample application demonstrate how to find the hotspots in a 3D rendering application called Tachyon, implement code changes to improve performance, and verify the performance improvements.

The sample used in the video can be found it the default installation location:

    /opt/intel/vtune_amplifier_2018.x.x.xxx/samples/en/C++/tachyon_amp_xe.tgz

Leverage Existing Performance Libraries

Pick one strategic loop or function that consumes a significant portion of your application runtime. Explore performance libraries, such as Intel® Threading Building Blocks, Intel® Math Kernel Library, or Intel® Integrated Performance Primitives to discover already tuned algorithms that you can simply drop in and build into your application at link time.

Optimize a loop centric application using Intel® Threading Building Blocks

An application that spends a lot of time in a serial “for” loop can improve performance by replacing the serial “for” loop with a parallel alternative.

This sample speeds up a “for” loop which optimizes a Taylor series algorithm by using Intel® Threading Building Blocks Parallel_for construct.

Download the TBB Sample

Optimize matrix operations using the Intel® Math Kernel Library

Suppose you discover that a matrix multiplication has been identified as your chief hotspot. See how to use Intel® Math Kernel Library (MKL) to improve performance.

Download the MKL sample

Leverage the Intel® C++ Compiler

When there are no existing performance libraries that fit your algorithm, consider targeting loop centric codes with the Intel® compiler, and take advantage of the Intel® compiler’s Application Binary Interface (ABI) compatibility to link with object files already compiled with your legacy compiler.  

Optimize a classic graph algorithm using Intel C++ Compiler

Explore how to increase performance of this classic graph algorithm using thread-based and explicit vector programming technique. Optimize the C++ sample of Dijkstra’s shortest path graph algorithm using the Intel compiler:

Download the Dijkstra Sample

Leverage Parallelism

Modern architectures provide ample CPU cores to compute with. Before your implement parallelism, use Intel® Advisor XE to get guidance and play “what if” scenarios to see where you should focus the parallelism design effort.

Identifying Candidates for Parallelization using Intel® Advisor 

This Getting Started video introduces the workflow for Threading Advisor and briefly demonstrates the usage, purpose, and interpretation of each analysis type using the nqueens_Advisor sample code.

After installation on your system, the default sample location is:

     /opt/intel/advisor_2018/samples/en/C++/nqueens_Advisor.tgz

Check Correctness

At some point in your optimization process, you will want to verify that your application is computing correctly, and is avoiding memory and resource leaks and threading deadlocks and race conditions. Use Intel® Inspector to check your application for such issues.

Checking Correctness using Intel® Inspector

Explore how to check for memory and resource issues, as well as how to do thread checking of your application. Use Intel® Inspector to check for memory leaks and thread correctness issues, such as race conditions and deadlocks.

After installation on your system, the default sample location is:

     /opt/intel/inspector_2018.x.x.xxx/samples/en/C++/tachyon_insp_xe.tgz

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804