We are continuing the discussion of optimization of multi-threaded applications. In this episode, we will talk about improving the application performance by proper logical thread placement relative to physical threads, cores, sockets and so on.
Videos Within This Chapter:
Part 1: Optimization Roadmap
Part 2: Scalar Tuning and General Optimization
Part 3: Optimization of Vectorization-Data Structures
Part 4: Optimization of Vectorization-Alignment and Hints
Part 5: Optimization of Vectorization: Regularizing Pattern
Part 6: Strip-Mining for Vectorization
Part 7: Vectorization Tuning Knobs
Part 8: Optimization of Synchronization in Multithreaded Applications
Part 9: Elimination of False Cache Line Sharing
Part 10: Do You Have Enough Parallelism in Your Code?
Part 11: Thread Affinity Control
Part 12: Optimization of Memory Access
Part 13: Example of Loop Tiling
Part 14: Example of Cache-Oblivious Recursion
Part 15: NUMA and Allocation on First Touch
Part 16: Optimization of Communication: Offload
Part 17: Optimization of Communication: MPI
Part 18: Additional Topic-Load Balancing in Heterogeneous Systems
Part 19: Closing Words
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804