CPU / Memory Roofline Insights Perspective from Command Line
- Collect OpenCL™ kernels timings and memory data using the Survey analysis with GPU profiling.
- Measure the hardware limitations and collect floating-point and integer operations data using the Characterization analysis with GPU profiling.Intel® Advisorcalculates compute operations (FLOP and INTOP) as a weighted sum of the following groups of instructions: BASIC COMPUTE, FMA, BIT, DIV, POW, MATHIntel Advisorautomatically determines data type in the collected operations using thedstregister.
Plot a CPU Roofline Chart
- Run the Roofline analysis for CPU with one of the following methods:
- Using the shortcut command line action:advisor --collect=roofline --project-dir=<project-dir>[--stacks] [--enable-cache-simulation] --<target-application>[<target-options>]
- Using two separate commands:advisor --collect=survey --project-dir=<project-dir>--<target-application>[<target-options>]advisor --collect=tripcounts --flop [--stacks] [--enable-cache-simulation] --<target-application>[<target-options>]Use this method to analyze an MPI application. See Analyze MPI Workloads for details.where:
Without these two options,Intel Advisorgenerates a basic CPU Roofline chart based on the Cache-Aware Roofline Model (CARM).
- --stacksis an option to enable advanced collection of call stack data. Use this option to generate a CPU Roofline chart with call stacks to extend the basic model withtotaldata capability. The total data includes data from the loop/function itself and its inner loops/functions.
- --enable-cache-simulationis an option to model multiple levels of cache and evaluate the data transfers between the different memory layers available on your system. Use this option to generate Memory-Level CPU Roofline chart.
- Optional: Check memory access patterns to get a detailed information about memory usage. Run the Memory Access Patterns analysis for the marked loops:advisor --collect=map --project-dir=<project-dir>[--enable-cache-simulation] --select=<criteria>--<target-application>[<target-options>]where:
This analysis does not add more information to the CPU Roofline chart. The results are added to the Refinement report, which you can view from GUI or from CLI. Use it to understand the Memory-Level Roofline chart better and get more detailed optimization recommendations.
- --enable-cache-simulationis an option to model accurate memory footprints, miss information, and cache line utilization. Use this option for the Memory Access Patterns analysis if you used this option for the Roofline.
- --select=is an option to select loops for the analysis by loop IDs, source locations, criteria such as<string>scalar,has-issue, ormarkup=. For example, use<markup-mode>--select=has-issueto analyze loops that have thePossible Inefficient Memory Access Patternissue.For more information about markup options, see Loop Markup to Minimize Overhead.
advisor --collect=roofline --project-dir=./advi --stacks --enable-cache-simulation -– myApplication
View the Results
- Roofline chart that plots an application's achieved performance and arithmetic intensity against the CPU maximum achievable performance
- Additional information about your application in theAdvanced Viewpane under the chart, including source code, detailed code analytics for trip counts and FLOP/INTOP data, optimization recommendations, and compiler diagnosticsSelect a dot on the Roofline chart to see details for the selected loop in all tabs of theAdvanced Viewpane
- --report-output=is a path and a name for an HTML file to save the report to. For example,<path>/home/roofline.html. This option is required to generate an HTML report.
- --with-stackis an option to enable call stack data in the HTML report. Use it if you generated the CPU Roofline results with call stack data using the--stacksoption.
- --data-type=is a specific type of data to show in the HTML report. Available types are<type>float(default),int,mixed. Youcannotchange the data type after the report in generated.
- --memory-level=is a specific memory level(s) to show in the HTML report by default. Available memory levels are<string>L1(default),L2,L3, andDRAM. You can combine several memory levels with an underscore (for example,L1_L2)
- Expand thePerformance Metrics Summarydrop-down to view the summary performance characteristics for your application.
- Double-click a dot on the chart to see a roof ruler that point to exact roofs that bound the dot.
- Hover over a dot to see a detailed tooltip with performance metrics.
- Select memory levels to show dots for from the filter drop-down list on the chart.
- Double-click a dot on the chart to expand it for other memory levels and see roof rulers.
- --cache-sourcesis an option to add application source code to the snapshot.
- --cache-binariesis an option to add application binaries to the snapshot.
- <snapshot-pathis a path and a name for the snapshot. For example, if you specify/tmp/new_snapshot, a snapshot is saved in atmpdirectory asnew_snapshot.advixeexpz. You can skip this and save the snapshot to a current directory assnapshot.XXX.advixeexpz