In this episode, we will talk about a memory traffic optimization technique important in NUMA systems.
Videos Within This Chapter:
Part 1: Optimization Roadmap
Part 2: Scalar Tuning and General Optimization
Part 3: Optimization of Vectorization-Data Structures
Part 4: Optimization of Vectorization-Alignment and Hints
Part 5: Optimization of Vectorization: Regularizing Pattern
Part 6: Strip-Mining for Vectorization
Part 7: Vectorization Tuning Knobs
Part 8: Optimization of Synchronization in Multithreaded Applications
Part 9: Elimination of False Cache Line Sharing
Part 10: Do You Have Enough Parallelism in Your Code?
Part 11: Thread Affinity Control
Part 12: Optimization of Memory Access
Part 13: Example of Loop Tiling
Part 14: Example of Cache-Oblivious Recursion
Part 15: NUMA and Allocation on First Touch
Part 16: Optimization of Communication: Offload
Part 17: Optimization of Communication: MPI
Part 18: Additional Topic-Load Balancing in Heterogeneous Systems
Part 19: Closing Words
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.