Examine Bottlenecks on CPU Roofline Chart
- By dot size and color, identify loops that take most of total program time and/or located very low in the chart. For example:
You can switch between coloring the dots by execution time and coloring the dots by type (scalar or vectorized) in the roof view menu on the right.
- Small, green dots take up relatively little time, so are likely not worth optimizing.
- Large, red dots take up the most time, so the best candidates for optimization are the large, red dots with a large amount of space between them and the topmost roofs.
- Depending on the dots position, identify what the loops are bounded by.Intel® Advisormarks the roofline zones on the chart to help you identify what roofs bound the loop:
- Loop is bounded by memory roofs.
- Loop is bounded by compute roofs.
- Loop is bounded by both memory and compute roofs.
- Select a dot on the chart, open theCode Analyticstab, and refer to the Roofline pane for more details about a specific roof that bounds the loop.
- In theRecommendationstab, scroll down to theRoofline Guidancesection that provides you hints on next optimization steps for a selected loop/function.
Below a memory roof (DRAM Bandwidth, L1 Bandwidth, so on)
The loop/function uses memory inefficiently.
Run a Memory Access Patterns analysis for this loop.
Below Vector Add Peak
The loop/function under-utilizes available instruction sets.
Traitscolumn in the Survey report to see if FMAs are used.
Just above Scalar Add Peak
The loop/function is undervectorized.
Check vectorization efficiency and performance issues in the Survey. Follow the recommendations to improve it if it's low.
Below Scalar Add Peak
The loop/function is scalar.