This article is part of the Intel® Modern Code Developer Community documentation which supports developers in leveraging application performance in code through a systematic step-by-step optimization framework methodology. This article addresses: Thread level parallelization.
Reference Link and Download
Intel Vectorization Tools
No longer does Moore’s Law result in higher frequencies and improved scalar application performance; instead, higher transistor counts lead to increased parallelism, both through more cores and thr
When confronted with nested loops, the granularity of the computations that are assigned to threads will directly affect performance. Loop transformations such as splitting and merging nested loops can make parallelization easier and more productive.
With automatic parallelization, the compiler detects loops that can be safely and efficiently executed in parallel and generates multithreaded code.
The key to performance measurement is two-fold, know exactly what you are measuring and collect your baseline data. Next, profile your application and identify a specific and realistic performance goal based on the profiling data. Follow these steps to optimize your software.
One key to attaining good parallel performance is choosing the right granularity for the application. Granularity is the amount of real work in the parallel task. If granularity is too fine, then performance can suffer from communication overhead.
This article discussions parallelization and provides links that will help you understand your programming environment and evaluate the suitability of your app.