Advice and background information is given on typical issues that may arise when threading an application using the Intel Fortran Compiler and other software tools, whether using OpenMP, automatic parallelization or threaded libraries.
When confronted with nested loops, the granularity of the computations that are assigned to threads will directly affect performance. Loop transformations such as splitting and merging nested loops can make parallelization easier and more productive.
One key to attaining good parallel performance is choosing the right granularity for the application. Granularity is the amount of real work in the parallel task. If granularity is too fine, then performance can suffer from communication overhead.
How to configure OpenMP in the Intel IPP library to maximize multi-threaded performance of the Intel IPP primitives.
A toolkit that gives 6 Steps to Increase Performance Through Vectorization in Your Application
This article discussions parallelization and provides links that will help you understand your programming environment and evaluate the suitability of your app.
Cache Blocking Techniques Overview
Memory Layout Transformations Overview
This article is part of the Intel® Modern Code Developer Community documentation which supports developers in leveraging application performance in code through a systematic step-by-step optimization framework methodology. This article addresses: Thread level parallelization.
Get an overview of parallelization using the Intel® MPI Library and links to additional documentation.