Tutorial

Summary

You have completed the Finding Hotspots tutorial. Here are some important things to remember when using the
Intel® VTune™
Profiler
to analyze your code for hotspots and hardware issues:
Step
Tutorial Recap
Key Tutorial Take-aways
1. Find hotspots
You launched the Hotspots data collection that analyzes function calls and CPU time spent in each program unit of your application and identified the following hotspots:
  • Identified a function that took the most CPU time and could be a good candidate for algorithm tuning.
  • Identified the code section that took the most CPU time to execute.
  • Start analyzing the performance of your application from the
    Summary
    window to explore the performance metrics for the whole application. Then, move to the
    Bottom-up
    window to analyze the performance per function. Focus on the hotspots - functions that took the most CPU time. By default, they are located at the top of the table.
  • Double-click the hotspot function in the
    Bottom-up
    pane or
    Call Stack
    pane to open its source code and identify the code line that took the most CPU time.
2. Discover hardware usage bottlenecks
You ran the Microarchitecture Exploration analysis that monitors how your application performs against a set of event-based hardware metrics as follows:
  • Analyzed the data provided in the
    Microarchitecture Exploration
    viewpoint, explored the event-based metrics, identified the areas where your sample application had hardware issues, and found the exact function with poor performance per metrics that could be a good candidate for further analysis.
  • Analyzed the code for the hotspot function identified in the
    Bottom-up
    window and located the hotspot line that generated a high number of CPU Clockticks.
See the
Details
section of the Microarchitecture Exploration configuration section to get the list of processor events used for this analysis type.
3. Resolve detected issues
You solved the memory access issue for the sample application by interchanging the loops and sped up the execution time. You also considered using the Intel C++ Compiler to enable instruction vectorization.
  • Start analyzing the performance of your application from the
    Summary
    window to explore the event-based performance metrics for the whole application. Mouse over the help icons to read the metric descriptions. Use the Elapsed time value as your performance baseline.
  • Move to the
    Bottom-up
    window and analyze the performance per function. Analyze the hardware issues detected for the hotspot functions (functions with the highest Clockticks). Hardware issues are highlighted in pink. Mouse over a highlighted value to read the issues description and see the threshold formula.
  • Double-click the hotspot function in the
    Bottom-up
    pane to open its source code and identify the code line that took the highest Clockticks event count.
  • Consider using Intel C++ Compiler to vectorize instructions. Explore the compiler documentation for more details.
4. Check your work
You ran Microarchitecture Exploration analysis on the optimized code and compared the results before and after optimization using the Compare mode of the
VTune
Profiler
. Compare analysis results regularly to look for regressions and to track how incremental changes to the code affect its performance.
Perform regular regression testing by comparing analysis results before and after optimization. From GUI, click the
Compare Results
button on the
VTune
Profiler
toolbar. From command line, use the
vtune
command.
Next step:
Prepare your own application(s) for analysis. Then use the
VTune
Profiler
to find and eliminate performance problems.
Optimization Notice
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804