Memory Traffic Optimization


In Episode 9 of the “Hands-On Workshop (HOW) series on parallel programming and optimization with Intel® architectures” we discuss memory traffic optimization.

We discuss the requirement of data access locality in space and time and demonstrate techniques for achieving it:

  • Loop tiling
  • Cache-oblivious recursion
  • Loop fusion
  • Parallel first touch

We also review a sample application performing matrix-vector multiplication, and see how it’s optimized using these techniques.

In the hands-on part of the episode, we demonstrate the application of the discussed methods to the matrix-vector multiplication code.

Product and Performance Information


