In Episode 9 of the “Hands-On Workshop (HOW) series on parallel programming and optimization with Intel® architectures” we discuss memory traffic optimization.
We discuss the requirement of data access locality in space and time and demonstrate techniques for achieving it:
- Loop tiling
- Cache-oblivious recursion
- Loop fusion
- Parallel first touch
We also review a sample application performing matrix-vector multiplication, and see how it’s optimized using these techniques.
In the hands-on part of the episode, we demonstrate the application of the discussed methods to the matrix-vector multiplication code.