At SC13 (Super Computing 2013)*, someone commented that Intel seems to have some super-secret set of tricks in its pocket, allowing us to optimize “far beyond those of mortal man”+. We don’t really have any super-secret tricks. Even if we did, we wouldn’t use them. We want mortal man (you) to be able to reproduce whatever we do. It is also in our business interest to insure that you can optimize on Intel hardware to the fullest extent possible. We sell hardware, and that means we want you to know that Intel hardware is the best thing since the invention of peanut butter (or hazelnut chocolate spread for you Euro folk). That means it is in our vested interest to insure that you can optimize your code to the maximum extent possible. That includes being able to reproduce any optimizations we do.
Thus, I’m writing this series of blogs on performance measurement and analysis.
Unlike my previous blog series, I will (hopefully) not be alone. There should be at least two of us, and hopefully more, bending your ear. And I hope it will be an on-going and never ending series of BKMs passing our knowledge on to you. The focus will be the Intel® Xeon Phi™ coprocessor but the techniques we discuss will apply to all architectures, including the hardware of others whom we shall not mention by name.
Where possible, we will direct you to other references so as not to reproduce the work of others. For example, David Mackay and Shannon Cepeda created a great document on the essentials of optimization for the coprocessor: Optimization and Performance Tuning for Intel Xeon Phi Coprocessors - Part 1: Optimization Essentials and Optimization and Performance Tuning for Intel Xeon Phi Coprocessors, Part 2: Understanding and Using Hardware Events.
One last note: Optimization involves identifying opportunities and then applying optimization tuning techniques. This series will concentrate on the use of tools and analysis for finding optimization opportunities, not on optimization techniques. For example, we’ll talk about the use of gprof, compiler reports and Intel® VTune™ Amplifier XE 2013, not how to unroll loops or use OpenMP*.
Not necessarily in order, we will try and cover:
- Basic load-based performance analysis
- Power performance issues (of course)
- Memory hierarchy and identifying bottlenecks
- Detecting load imbalances
- And much more
NEXT: Performance Tools and BKM References (probably)
+Iconic reference to a quote from the Superman* comics.
Mackay, David (2012) “Optimization and Performance Tuning for Intel® Xeon Phi™ Coprocessors - Part 1: Optimization Essentials,” http://software.intel.com/en-us/articles/optimization-and-performance-tuning-for-intel-xeon-phi-coprocessors-part-1-optimization, downloaded on March 27th, 2014.
Cepeda, Shannon (2012) “Optimization and Performance Tuning for Intel® Xeon Phi™ Coprocessors, Part 2: Understanding and Using Hardware Events,” http://software.intel.com/en-us/articles/optimization-and-performance-tuning-for-intel-xeon-phi-coprocessors-part-2-understanding, downloaded on March 27th, 2014.