Intel® Parallel Studio XE 2015 Composer Edition on Linux*

Intel® Parallel Studio XE 2016 Professional EditionProduct tour with videos and samples

Learn when and how to use the Intel Parallel Studio XE components in a typical software development workflow. You can apply the principles learned to your own application:

  1. Identify the performance hotspots in your application
  2. Leverage performance libraries to speed up the hotspots
  3. Compile and optimize using the Intel® C++ or Fortran compilers,  and link the optimized binary into your application

Sample Applications

Some of the sample applications referred to by this article are available in your default installation directory, while others can be downloaded. To download the samples:

  1. Click on the download link below
  2. Download and untar the file to a local folder.
  3. Open the sample folder.
  4. Build the sample using the provided Makefile
        $ make
  1. run the application by running the application specific name or a.out as appropriate for the application
    $ ./a.aout

Identify Performance Hotspots

As a first step, identify the functions, loops, and files that have the biggest impact on your application’s performance. The Intel® Parallel Studio XE Cluster and Intel® Parallel Studio XE Professional Editions provide the Intel® VTune™ Amplifier tool for this purpose. Since Intel® VTune™ Amplifier is not included in the Intel® Parallel Studio XE Composer Edition, you will need to use other means to identify the hot functions and loops in your code.

Leverage Existing Performance Libraries

Pick one strategic loop or function that consumes a significant portion of your application runtime. Explore performance libraries, such as Intel® Threading Building Blocks, Intel® Math Kernel Library, or Intel® Integrated Performance Primitives to identify already tuned algorithms that you can simply drop in and build into your application at link time.

Optimize a loop-centric application using Intel® Threading Building Blocks

An application that spends a lot of time in a serial “for” loop can improve performance by replacing the serial “for” loop with a parallel alternative.

This sample speeds up a “for” loop which optimizes a Taylor series algorithm by using Intel® Threading Building Blocks Parallel_for construct.

Download the sample

Optimize matrix operations using the Intel® Math Kernel Library

Suppose you discover that a matrix multiplication has been identified as your chief hotspot. See how to use Intel® Math Kernel Library (MKL) to improve performance.

Download the sample

Leverage the Intel® C++ Compiler

When there are no existing performance libraries that fit your algorithm, consider targeting loop centric codes with the Intel® compiler, and take advantage of the Intel® compiler’s Application Binary Interface (ABI) compatibility to link with object files already compiled with your legacy compiler.  

Optimize a classic graph algorithm using Intel C++ Compiler

Explore how to increase performance of this classic graph algorithm using thread-based and explicit vector programming technique. Optimize the C++ sample of Dijkstra’s shortest path graph algorithm using the Intel compiler:

Download the sample

Leverage the Intel® Fortran Compiler

Optimize a Pythagorean prime number finder using OpenMP* with the Intel® Fortran Compiler

Test drive Intel® Fortran with OpenMP* to optimize a Pythagorean prime number finder.

Watch the video

Download the sample

For more complete information about compiler optimizations, see our Optimization Notice.