With automatic parallelization, the compiler detects loops that can be safely and efficiently executed in parallel and generates multithreaded code.
When confronted with nested loops, the granularity of the computations that are assigned to threads will directly affect performance. Loop transformations such as splitting and merging nested loops can make parallelization easier and more productive.
One key to attaining good parallel performance is choosing the right granularity for the application. Granularity is the amount of real work in the parallel task. If granularity is too fine, then performance can suffer from communication overhead.
How to configure OpenMP in the Intel IPP library to maximize multi-threaded performance of the Intel IPP primitives.
Avoiding Heap Contention Among Threads (PDF 256KB)
Detecting Memory Bandwidth Saturation in Threaded Applications (PDF 23
Download this guide for developing multithreaded applications, which also includes general topics such as application threading and synchronization.
Part one of a five-part series, this article teaches a methodology to interpret statistics gathered during test runs and use those interpretations to improve parallel code.
By Jim DempseyIn my last article we left off with
By Jim Dempsey