After building the sample application and collecting baseline performance data for it, rerun it under the scrutiny of Intel® VTune™ Amplifier to discover what parts of the code are being most used. Hotspots analysis collects event and IP (Instruction Pointer) information to reveal evidence of a basic set of hardware issues induced by the application code that may be affecting its performance.
The Hotspots predefined configuration opens on the right. You may specify which cards to use for collection using the List of Intel Xeon Phi coprocessor cards option. By default, the data is collected on card 0.
VTune Amplifier starts the
bat script that runs the
matrix.mic application on the Intel Xeon Phi coprocessor card. The application calculates a large matrix multiply before exiting. When the application exits or after a predefined interval, depending on how the collection run was configured, collection is completed and the VTune Amplifier enters its finalization process, where data are coalesced, symbols are reconnected to their addresses, and certain data are cached to speed the display of results.
To make sure the performance of the application is repeatable, go through the entire tuning process on the same system with a minimal amount of other software executing.