Intel® Parallel Studio XE 2018 Professional Edition for Linux*

Product tour with videos and samples

Learn when and how to use the Intel Parallel Studio XE components in a typical software development workflow. You can apply the principles learned to your own application:

  1. Identify the performance hotspots in your application using the Intel® VTune™ Amplifier 
  2. Leverage performance libraries to speed up the hotspots
  3. Compile and optimize using the Intel® C++ Compiler,  and link the optimized binary into your application
  4. Check for issues using Intel® Inspector.

Sample Applications

Some of the sample applications referred to by this article are available in your default installation directory, while others can be downloaded. To download the samples:

  1. Click on the download link below.
  2. Download and untar the file to a local folder.
  3. Open the sample folder.
  4. Build the sample using the provided Makefile: 
    $ make
  1. Run the application by running the application specific name or a.out as appropriate for the application:
$ ./a.out

Identify performance hotspots

As a first step, use the Intel® VTune™ Amplifier to identify the functions, loops, and files that have the biggest impact on your application’s performance.

The following video and sample application demonstrate how to find the hotspots in a 3D rendering application called Tachyon, implement code changes to improve performance, and verify the performance improvements.

The sample used in the video can be found it the default installation location:

    /opt/intel/vtune_amplifier_2018.x.x.xxx/samples/en/C++/tachyon_amp_xe.tgz

Leverage Existing Performance Libraries

Pick one strategic loop or function that consumes a significant portion of your application runtime. Explore performance libraries, such as Intel® Threading Building Blocks, Intel® Math Kernel Library, or Intel® Integrated Performance Primitives to discover already tuned algorithms that you can simply drop in and build into your application at link time.

Optimize a loop centric application using Intel® Threading Building Blocks

An application that spends a lot of time in a serial “for” loop can improve performance by replacing the serial “for” loop with a parallel alternative.

This sample speeds up a “for” loop which optimizes a Taylor series algorithm by using Intel® Threading Building Blocks Parallel_for construct.

Download the TBB Sample

Optimize matrix operations using the Intel® Math Kernel Library

Suppose you discover that a matrix multiplication has been identified as your chief hotspot. See how to use Intel® Math Kernel Library (MKL) to improve performance.

Download the MKL sample

Leverage the Intel® C++ Compiler

When there are no existing performance libraries that fit your algorithm, consider targeting loop centric codes with the Intel® compiler, and take advantage of the Intel® compiler’s Application Binary Interface (ABI) compatibility to link with object files already compiled with your legacy compiler.  

Optimize a classic graph algorithm using Intel C++ Compiler

Explore how to increase performance of this classic graph algorithm using thread-based and explicit vector programming technique. Optimize the C++ sample of Dijkstra’s shortest path graph algorithm using the Intel compiler:

Download the Dijkstra Sample

Leverage Parallelism

Modern architectures provide ample CPU cores to compute with. Before your implement parallelism, use Intel® Advisor XE to get guidance and play “what if” scenarios to see where you should focus the parallelism design effort.

Identifying Candidates for Parallelization using Intel® Advisor 

This Getting Started video introduces the workflow for Threading Advisor and briefly demonstrates the usage, purpose, and interpretation of each analysis type using the nqueens_Advisor sample code.

After installation on your system, the default sample location is:

     /opt/intel/advisor_2018/samples/en/C++/nqueens_Advisor.tgz

Check Correctness

At some point in your optimization process, you will want to verify that your application is computing correctly, and is avoiding memory and resource leaks and threading deadlocks and race conditions. Use Intel® Inspector to check your application for such issues.

Checking Correctness using Intel® Inspector

Explore how to check for memory and resource issues, as well as how to do thread checking of your application. Use Intel® Inspector to check for memory leaks and thread correctness issues, such as race conditions and deadlocks.

After installation on your system, the default sample location is:

     /opt/intel/inspector_2018.x.x.xxx/samples/en/C++/tachyon_insp_xe.tgz
For more complete information about compiler optimizations, see our Optimization Notice.

1 comment

Top
Sahil S.'s picture

I was watching "Finding Application Hotspots on a Linux* System with Intel® VTune™ Amplifier XE" video, but it kept stopping at random places. I had to search for another link for it. Found one on youtube. Here is the link https://www.youtube.com/watch?v=4vAuwk3bj4s .

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.