As the last topic on optimization of vector calculations, we will discuss a programming technique that allows you to express vectorization opportunities to the compiler in complex situations.
This technique is called strip-mining. We will use it now to direct the compiler to vectorizable calculations, and also, later in the course, we will use this technique to re-balance parallelism between vectors and treads, and as a basis for memory optimization techniques.
Videos Within This Chapter:
5.1 - Optimization roadmap
5.2 - Scalar Tuning and General Optimization
5.3 - Optimization of Vectorization- Data Structures
5.4 - Optimization of Vectorization- Alignment and Hints
5.5 - Optimization of Vectorization: Regularizing Pattern
Strip-Mining for Vectorization
5.7 - Vectorization Tuning Knobs
5.8 - Optimization of Synchronization in Multithreaded applications
5.9 - Elimination of False Cache Line Sharing
5.10 - Do you have enough parallelism in your code?
5.11 - Thread affinity control
5.12 - Optimization of Memory Access
5.13 - Example of Loop Tiling
5.14 - Example of Cache-Oblivious Recursion
5.15 - NUMA and Allocation on First Touch
5.16 - Optimization of Communication: Offload
5.17 - Optimization of Communication - MPI
5.18 - Additional Topic- Load Balancing in Heterogeneous Systems
5.19 - Closing words