Product tour with videos and samples
Learn when and how to use the Intel Parallel Studio XE components in a typical software development workflow. You can apply the principles learned to your own application:
- Identify the performance hotspots in your application
- Leverage performance libraries to speed up the hotspots
- Compile and optimize using the Intel® C++ or Fortran compilers, and link the optimized binary into your application
Some of the sample applications referred to by this article are available in your default installation directory, while others can be downloaded. To download the samples:
- Click on the download link below
- Download and untar the file to a local folder.
- Open the sample folder.
- Build the sample using the provided Makefile
- run the application by running the application specific name or a.out as appropriate for the application
Identify Performance Hotspots
As a first step, identify the functions, loops, and files that have the biggest impact on your application’s performance. The Intel® Parallel Studio XE Cluster and Intel® Parallel Studio XE Professional Editions provide the Intel® VTune™ Amplifier tool for this purpose. Since Intel® VTune™ Amplifier is not included in the Intel® Parallel Studio XE Composer Edition, you will need to use other means to identify the hot functions and loops in your code.
Leverage Existing Performance Libraries
Pick one strategic loop or function that consumes a significant portion of your application runtime. Explore performance libraries, such as Intel® Threading Building Blocks, Intel® Math Kernel Library, or Intel® Integrated Performance Primitives to identify already tuned algorithms that you can simply drop in and build into your application at link time.
Optimize a loop-centric application using Intel® Threading Building Blocks
An application that spends a lot of time in a serial “for” loop can improve performance by replacing the serial “for” loop with a parallel alternative.
This sample speeds up a “for” loop which optimizes a Taylor series algorithm by using Intel® Threading Building Blocks Parallel_for construct.
Optimize matrix operations using the Intel® Math Kernel Library
Suppose you discover that a matrix multiplication has been identified as your chief hotspot. See how to use Intel® Math Kernel Library (MKL) to improve performance.
Leverage the Intel® C++ Compiler
When there are no existing performance libraries that fit your algorithm, consider targeting loop centric codes with the Intel® compiler, and take advantage of the Intel® compiler’s Application Binary Interface (ABI) compatibility to link with object files already compiled with your legacy compiler.
Optimize a classic graph algorithm using Intel C++ Compiler
Explore how to increase performance of this classic graph algorithm using thread-based and explicit vector programming technique. Optimize the C++ sample of Dijkstra’s shortest path graph algorithm using the Intel compiler:
Leverage the Intel® Fortran Compiler
Optimize a Pythagorean prime number finder using OpenMP* with the Intel® Fortran Compiler
Test drive Intel® Fortran with OpenMP* to optimize a Pythagorean prime number finder.