Get Started Guide

  • 2020
  • 09/09/2020
  • Public Content

Identify Performance Bottlenecks Using Roofline

This section shows how to get started using all
Vectorization Advisor
analyses, starting with the Roofline analysis. The main advantage of using this multi-analysis
Vectorization Advisor
workflow is the potential to generate an ideal roadmap of optimization steps. The main disadvantage is high runtime overhead. For example:
  • Roofline analysis runtime overhead can be 3x - 8x greater than native target application runtime.
  • Memory Access Patterns (MAP) analysis runtime overhead can be 5x - 20x greater.
  • Dependencies analysis runtime can be 5x - 100x greater.
Intel Advisor Typical Workflow: Identify Performance Bottlenecks Using Roofline
Roofline analysis
- Helps visualize actual performance against hardware-imposed performance ceilings, as well as determine the main limiting factor (memory bandwidth or compute capacity). When you run a Roofline analysis, the
Intel Advisor
:
  • Measures the hardware limitations of your machine and collects loop/function timings using the
    Survey analysis
    .
  • Collects floating-point and integer operations data, and memory data using the
    Trip Counts and FLOP analysis
    .
Dependencies analysis
- Checks for real data dependencies in loops the compiler did not vectorize because of assumed dependencies.
Memory Access Patterns (MAP) analysis
- Checks for various memory issues, such as non-contiguous memory accesses and unit stride vs. non-unit stride accesses.
Learn More About Roofline Charts
The
Roofline
chart plots an application's
achieved performance
and
arithmetic intensity
against the machine's
maximum achievable performance
:
  • Arithmetic intensity (x axis) - measured in number of floating-point operations (FLOPs) and/or integer operations (INTOPs) per byte, based on the loop/function algorithm, transferred between CPU/VPU and memory
  • Performance (y axis) - measured in billions of floating-point operations per second (GFLOPS) and/or billions of integer operations per second (GINTOPS)
In general:
  • The size and color of each
    Roofline
    chart dot represent relative execution time for each loop/function. Large red dots take the most time, so are the best candidates for optimization. Small green dots take less time, so may not be worth optimizing.
  • Roofline
    chart diagonal lines indicate memory bandwidth limitations preventing loops/functions from achieving better performance without some form of optimization. For example: The
    L1 Bandwidth
    roofline represents the maximum amount of work that can get done at a given arithmetic intensity if the loop
    always
    hits L1 cache. A loop does not benefit from L1 cache speed if a dataset causes it to miss L1 cache too often, and instead is subject to the limitations of the lower-speed L2 cache it
    is
    hitting. So a dot representing a loop that misses L1 cache too often but hits L2 cache is positioned somewhere below the
    L2 Bandwidth
    roofline.
  • Roofline
    chart horizontal lines indicate compute capacity limitations preventing loops/functions from achieving better performance without some form of optimization. For example: The
    Scalar Add Peak
    represents the peak number of add instructions that can be performed by the scalar loop under these circumstances. The
    Vector Add Peak
    represents the peak number of add instructions that can be performed by the vectorized loop under these circumstances. So a dot representing a loop that is not vectorized is positioned somewhere below the
    Scalar Add Peak
    roofline.
  • A dot cannot exceed the topmost rooflines, as these represent the maximum capabilities of the machine; however, not all loops can utilize maximum machine capabilities.
  • The greater the distance between a dot and the highest achievable roofline, the more opportunity exists for performance improvement.
In the following
Roofline
chart representation, loops A and G (large red dots), and to a lesser extent B (yellow dot far below the roofs), are the best candidates for optimization. Loops C, D, and E (small green dots) and H (yellow dot) are poor candidates because they do not have much room to improve or are too small to have significant impact on performance.
This is a visual model, not an actual screenshot, of the Roofline Chart
The
Intel Advisor
basic Roofline model, the Cache-Aware Roofline Model (CARM), offers
self data
capability. The
Intel Advisor
Roofline with Callstacks feature extends the basic model with
total data
capability:
  • Self data = Memory access, FLOPs, and duration related only to the loop/function itself and excludes data originating in other loops/functions called by it
  • Total data = Data from the loop/function itself and its inner loops/functions
The total-data capability in the Roofline with Callstacks feature can help you:
  • Investigate the source of loops/functions instead of just the loops/functions themselves.
  • Get a more accurate view of loops/functions that behave differently when called under different circumstances.
  • Uncover design inefficiencies higher up the call chain that could be the root cause of poor performance by smaller loops/functions.
The following
Roofline
chart representation shows some of the added benefits of the Roofline with Callstacks feature, including:
  • A navigable, color-coded
    Callstack
    pane that shows the entire call chain for the selected loop/function, but excludes its callees
  • Visual indicators (caller and callee arrows) that show the relationship among loops and functions
  • The ability to simplify dot-heavy charts by collapsing several small loops into one overall representation
    Loops/functions with no self data are grayed out when expanded and in color when collapsed. Loops/functions with self data display at the coordinates, size, and color appropriate to the data when expanded, but have a gray halo of the size associated with their total time. When such loops/functions are collapsed, they change to the size and color appropriate to their total time and, if applicable, move to reflect the total performance and total arithmetic intensity.
Intel Advisor: Roofline with Callstacks
For more information on how to produce, display, and interpret the Roofline with Callstacks extension to the
Roofline
chart, see Roofline with Callstacks.
There are several controls to help you show/hide the
Roofline
chart:
Intel Advisor: Roofline Chart & Survey Report
1
Click to toggle between
Roofline
chart view and
Survey Report
view.
2
Click to toggle to and from side-by-side
Roofline
chart and
Survey Report
view.
3
Drag to adjust the dimensions of the
Roofline
chart and
Survey Report
.
There are several controls to help you focus on the
Roofline
chart data most important to you, including the following.
Intel Advisor: Roofline controls
1
  • Select Loops by Mouse Rect
    : Select one or more loops/functions by tracing a rectangle with your mouse.
  • Zoom by Mouse Rect
    : Zoom in and out by tracing a rectangle with your mouse. You can also zoom in and out using your mouse wheel.
  • Move View By Mouse
    : Move the chart left, right, up, and down.
  • Undo
    or
    Redo
    : Undo or redo the previous zoom action.
  • Cancel Zoom
    : Reset to the default zoom level.
  • Export as x
    : Export the chart as a dynamic and interactive HTML or SVG file that does not require the
    Intel Advisor
    viewer for display. Use the arrow to toggle between the options.
2
Use the
Cores
drop-down toolbar to:
  • Adjust rooflines to see practical performance limits for your code on the host machine.
  • Build roofs for single-threaded applications (or for multi-threaded applications configured to run single threaded, such as one thread-per-rank for MPI applications. (You can use Intel Advisor filters to control the loops displayed in the
    Roofline
    chart; however, the
    Roofline
    chart does not support the
    Threads
    filter.)
Choose the appropriate number of CPU cores to scale roof values up or down:
  • 1 – if your code is single-threaded
  • Number of cores equal or close to the number of threads – if your code has fewer threads than available CPU cores
  • Maximum number of cores – if your code has more threads than available CPU cores
By default, the number of cores is set to the number of threads used by the application (even values only).
You’ll see the following options if your code is running on a multisocket PC:
  • Choose
    Bind cores to 1 socket
    (default) if your application binds memory to one socket. For example, choose this option for MPI applications structured as one rank per socket.
    This option may be disabled if you choose a number of CPU cores exceeding the maximum number of cores available on one socket.
  • Choose
    Spread cores between all n sockets
    if your application binds memory to all sockets. For example, choose this option for non-MPI applications.
3
  • Toggle the display between floating-point, integer operations, and mixed operations (floating-point and integer).
  • Enable the display of Roofline with Callstacks additions to the
    Roofline
    chart.
  • Select the
    Memory Level
    (s) to show for each loop/function in the chart (L1, L2, L3, DRAM). .
    This feature requires that you set the  
    ADVIXE_EXPERIMENTAL=int_roofline
    .
    Also be sure to enable the
    For All Memory Levels
    checkbox under
    Run Roofline
    in the Vectorization Workflow tab.
  • Determines which
    Memory Operation Types
    (s) to display data for in the Roofline chart,
    Loads
    ,
    Stores
    , or
    Loads and Stores
    .
4
Display
Roofline
chart data from other
Intel Advisor
results or non-archived snapshots for comparison purposes.
Use the drop-down toolbar to:
  • Load a result/snapshot and display the corresponding filename in the
    Compared Results
    region.
  • Clear a selected result/snapshot and move the corresponding filename to the
    Ready for comparison
    region.
    Note
    : Click a filename in the
    Ready for comparison
    region to reload the result/snapshot.
  • Save the comparison itself to a file.
    The arrowed lines showing the relationship among loops/functions do not reappear if you upload the comparison file.
Click a loop/function dot in the current result to show the relationship (arrowed lines) between it and the corresponding loop/function dots in loaded results/snapshots.
Intel Advisor: Roofline Comparison
5
Add visual indicators to the Roofline chart to make the interpretation of data easier, including performance limits and whether loops/functions are memory bound, compute bound, or both.
Use the drop-down toolbar to:
  • Show a vertical line from a loop/function to the nearest and topmost performance ceilings by enabling the
    Display roof rulers
    checkbox. To view the ruler, hover the cursor over a loop/function. Where the line intersects with each roof, labels display hardware performance limits for the loop/function.
  • Visually emphasize the relationships among displayed memory levels and roofs and for a selected loop/function dot by enabling the
    Show memory level relationships
    checkbox.
    This feature requires that you set the  
    ADVIXE_EXPERIMENTAL=int_roofline
    .
    Also be sure to enable the
    For All Memory Levels
    checkbox under
    Run Roofline
    in the Vectorization Workflow tab.
    To examine a dot, double-click it. You can also select a dot and press
    SPACE
    or
    ENTER
    . When you highlight the dot:
    • Labeled dots are displayed, representing memory levels for the selected loop/function; lines connect the dots to indicate that they correspond to the selected loop/function.
      If you have chosen to display only some memory levels in the chart using the
      Memory Level
      option, unselected memory levels are displayed with X marks.
    • An arrowed line is displayed, pointing to the memory level roofline that bounds the selected loop. If the arrowed line cannot be displayed, a message will pop up with instructions on how to fix it.
    Once you have a loop/function’s dots highlighted as described above, you can zoom and fit the Roofline chart to the dots for the selected loop/function by once again double-clicking the loop/function, or pressing
    SPACE
    or
    ENTER
    with the loop/function selected. Repeat this action to return to the original Roofline chart view.
    To hide the labeled dots, select another loop/function, or double-click an empty space in the Roofline chart.
  • Color the roofline zones to make it easier to see if enclosed loops/functions are fundamentally memory bound, compute bound, or bound by compute and memory roofs by enabling the
    Show Roofline boundaries
    checkbox.
The preview picture is updated as you select guidance options, allowing you to see how changes will affect the Roofline chart’s appearance. Click
Apply
to apply your changes, or
Default
to return the Roofline chart to its original appearance.
6
  • Roofline View Settings:
    Adjust the default scale setting to show:
    • The optimal scale for each
      Roofline
      chart view
    • A scale that accommodates all
      Roofline
      chart views
  • Roofs Settings:
    Change the visibility and appearance of roofline representations (lines):
    • Enable calculating roof values based on single-threaded benchmark results instead of multi-threaded.
    • Click a
      Visible
      checkbox to show/hide a roofline.
    • Click a
      Selected
      checkbox to change roofline appearance: display a roofline as a solid or a dashed line.
    • Manually fine-tune roof values in the
      Value
      column to set hardware limits specific to your code.
  • Loop Weight Representation
    : Change the appearance of loop/function weight representations (dots):
    • Point Weight Calculation
      : Change the
      Base Value
      for a loop/function weight calculation.
    • Point Weight Ranges
      : Change the
      Size
      ,
      Color
      , and weight
      Range (R)
      of a loop/function dot. Click the
      +
      button to split a loop weight range in two. Click the
      -
      button to merge a loop weight range with the range below.
    • Point Colorization
      : color loop/function dots by weight ranges or by type (vectorized or scalar). You can also change the color of loop with no self time.
You can save your Roofs Settings or Point Weight Representation configuration to a JSON file or load a custom configuration.
7
Zoom in and out using numerical values.
8
Hover your mouse over an item to display metrics for it.
Click a loop/function dot to:
  • Outline it in black.
  • Display metrics for it.
  • If Roofline with Callstacks is enabled, display the corresponding, navigable, color-coded callstack.
  • Display corresponding data in other window tabs.
You can also click an item in the
Callstack
pane to flash the corresponding loop/function dot in the
Roofline
chart.
If Roofline with Callstacks is enabled, click a loop/function dot Intel Advisor: Collapse control 
					 control to collapse descendant dots into the parent dot, or click a loop/function dot Intel Advisor: Expand control 
					 control to show descendant dots and their relationship via visual indicators to the parent dot.
Right-click a loop/function dot or a blank area in the
Roofline
chart to perform more functions, such as:
  • Further simplify the
    Roofline
    chart by filtering out (temporarily hiding a dot), filtering in (temporarily hiding all other dots), and clearing filters (showing all originally displayed dots).
  • Copy data to the clipboard.
If For All Memory Levels is enabled, double-click a a loop/function dot to display labeled dots representing the loop/function at selected memory levels (L1, L2, L3, DRAM). For details, see the
Memory Level
and
Show memory level relationships
options described above.
9
If Roofline with Callstacks is enabled, show/hide the
Callstack
pane.
10
Display the number and percentage of loops in each loop weight representation category.
Set Up Environment
Environment
Set-Up Tasks
Intel® Parallel Studio XE
/Linux* OS
  • Do one of the following:
    • Run one of the following
      source
      commands:
      • For csh/tcsh users:
        source <advisor-install-dir>/advixe-vars.csh
      • For bash users:
        source <advisor-install-dir>/advixe-vars.sh
      The default installation path,
      <advisor-install-dir>
      , is inside:
      • /opt/intel/
        for root users
      • $HOME/intel/
        for non-root users
    • Add
      <advisor-install-dir>/bin32
      or
      <advisor-install-dir>/bin64
      to your path.
    • Run the
      <parallel-studio-install-dir>/psxevars.csh
      or
      <parallel-studio-install-dir>/psxevars.sh
      command. The default installation path,
      <parallel-studio-install-dir>
      , is inside:
      • /opt/intel/
        for root users
      • $HOME/intel/
        for non-root users
  • Set the
    VISUAL
    or
    EDITOR
    environment variable to identify the external editor to launch when you double-click a line in an
    Intel Advisor
    source window. (
    VISUAL
    takes precedence over
    EDITOR
    .)
  • Set the
    BROWSER
    environment variable to identify the installed browser to display
    Intel Advisor
    documentation.
  • If you are using
    Intel® Threading Building Blocks (Intel® TBB)
    , set the
    TBBROOT
    environment variable so your compiler can locate the installed
    Intel TBB
    include
    directory.
  • Make sure you run your application in the same Linux* OS environment as the
    Intel Advisor
    .
Intel Parallel Studio XE
/Windows* OS
Setting up the Windows* OS environment is necessary only if you plan to use the
advixe-cl
command to run the command line interface, or choose to use the
advixe-gui
 command to launch the
Intel Advisor
standalone GUI instead of using available GUI or IDE launch options.
Do one of the following:
  • Run the
    <advisor-install-dir>\advixe-vars.bat
    command.
    The default
    Intel Advisor
    installation path,
    <advisor-install-dir>
    , is inside
    C:\Program Files (x86)\IntelSWTools\
    (on certain systems, instead of
    Program Files (x86)
    , the directory name is
    Program Files
    ).
  • Run the
    <parallel-studio-install-dir>\psxevars.bat
    command.
    The default installation path,
    <parallel-studio-install-dir>
    , is inside
    C:\Program Files (x86)\IntelSWTools\
    .
Intel® System Studio
Setting up the environment is necessary only if you plan to use the
advixe-cl
command to run the command line interface, or choose to use the
advixe-gui
 command to launch the
Intel Advisor
standalone GUI instead of using available GUI or IDE launch options.
Run the
<advisor-install-dir>\advixe-vars.bat
command to set up your environment. The default installation path,
<advisor-install-dir>
, is below
C:\Program Files (x86)\IntelSWTools\
(on certain systems, instead of
Program Files (x86)
, the directory name is
Program Files
).
Intel® Advisor Beta
/Linux* OS
Do one of the following:
  • If you installed the product as part of a standalone installation, run the following
    source
    commands:
    • For bash users:
      source <advisor-install-dir>\
      env\vars
      .sh
    The default installation path,
    <advisor-install-dir>
    , is inside:
    • /opt/intel/
      for root users
    • $HOME/intel/
      for non-root users
  • If you installed the product as part of an
    Intel® oneAPI Base Toolkit
    installation, run one of the following
    source
    commands:
    • For bash users:
      source <oneapi-install-dir>
      env\vars
      .sh
    The default installation path,
    <oneapi-install-dir>
    , is inside:
    • /opt/intel/inteloneapi
      for root users
    • /$HOME/intel/inteloneapi
      for non-root users
Intel® Advisor Beta
/Windows* OS
  • If you installed the product as part of a standalone installation, run the
    <advisor-install-dir>\
    env\vars
    .bat
    command.
    The default
    Intel Advisor
    installation path,
    <advisor-install-dir>
    , is inside
    C:\Program Files (x86)\IntelSWTools\
    (on certain systems, instead of
    Program Files (x86)
    , the directory name is
    Program Files
    ).
  • If you installed the product as part of a
    Intel® oneAPI Base Toolkit
    installation, run the
    <oneapi-install-dir>\
    env\vars
    .bat
    command.
    The default
    Intel® Advisor Beta
    installation folder,
    <oneapi-install-dir>
    , is inside
    C:\Program Files (x86)\inteloneapi\
    (on certain systems, instead of
    Program Files (x86)
    , the directory name is
    Program Files
    ).
Launch
Intel Advisor
and Create a Project
To launch the:
  • Intel Parallel Studio XE
    /
    Intel Advisor
    standalone GUI:
    • In the Linux* OS: Run the
      advixe-gui
      command.
    • In the Windows* OS: From the Microsoft Windows*
      All Apps
      screen, select
      Intel Parallel Studio XE [version]
      >
      Intel Advisor [version]
  • Intel System Studio
    /
    Intel Advisor
    standalone GUI: Choose
    Tools
    Intel Advisor
    Launch Intel Advisor
    from the IDE menu.
  • Intel Advisor
    plug-in to the Visual Studio* IDE: Open your solution in the Visual Studio* IDE.
To create an
Intel Advisor
project:
  1. Do one of the following
    • In the standalone GUI: Choose
      File
      New
      Project…
      to open the
      Create a Project
      dialog box. Supply a name and location for your project, then click the
      Create Project
      button to open the
      Project Properties
      dialog box.
    • In the Visual Studio* IDE: Choose
      Project
      >
      Intel Advisor [version] Project Properties...
      to open the
      Project Properties
      dialog box.
  2. On the left side of the
    Analysis Target
    tab, ensure the
    Survey Hotspots Analysis
    type is selected and set appropriate parameters.
  3. Set appropriate parameters for other analysis types and tabs. (Setting the binary/symbol search and source search directories is optional for the
    Vectorization Advisor
    .)
  • If possible, use the
    Inherit settings from Survey Hotspots Analysis Type
    checkbox for other analysis types.
  • The
    Trip Counts and FLOP Analysis
    type has similar parameters to the
    Survey Hotspots Analysis
    type.
  • The
    Dependencies Analysis
    and
    Memory Access Patterns Analysis
    types consume more resources than the
    Survey Hotspots Analysis
    type. If these Refinement analyses take too long, consider decreasing the workload.
  • Select
    Track stack variables
    in the
    Dependencies Analysis
    type to detect all possible dependencies.
Run Roofline Analysis
Intel Advisor Vectorization Workflow Tab: Run Roofline
Under
Run Roofline
in the
Vectorization Workflow
, click the Intel Advisor control: Run analysis 
          control to execute your target application. Upon completion, the
Intel Advisor
displays a
Roofline
chart.
To implement the Roofline with Callstacks feature:
Intel Advisor: Roofline with Callstacks
  1. Run the Roofline analysis with the
    With Callstacks
    checkbox enabled. Upon completion, the
    Intel Advisor
    displays a
    Roofline
    chart.
  2. Enable the
    With Callstacks
    checkbox in the
    Roofline
    chart.
If the
Workflow
is not displayed in the Visual Studio IDE: Click the Intel Advisor toolbar icon 
            icon on the
Intel Advisor
toolbar.
Investigate Loops
If all loops are vectorizing properly and performance is satisfactory, you are done! Congratulations!
If one or more loops is not vectorizing properly and performance is unsatisfactory:
  1. Check data in associated
    Intel Advisor
    views to support your
    Roofline
    chart interpretation. For example: Check the
    Vectorized Loops/Efficiency
    values in the
    Survey Report
    or the data in the
    Code Analytics
    tab.
  2. Improve application performance using various
    Intel Advisor
    features to guide your efforts, such as:
    • Information in the Intel Advisor control: Recommendations
      Performance Issues
      column and associated Intel Advisor control: Recommendations
      Recommendations
      tab
      Intel Advisor Recommendations
      Table of contents on right, showing recommendations for each issue relevant to the loop. Expandable/collapsible recommendations on left (some reference details specific to the analyzed loop, such as vector length or trip count). Number of bars on recommendation icon shows confidence this recommendation is the appropriate fix.
    • Information in the Intel Advisor control: Compiler diagnostic details
      Why No Vectorization?
      column and associated Intel Advisor control: Compiler diagnostic details
      Why No Vectorization?
      tab
    • Suggestions in Next Steps: After Running Survey Analysis in the
      Intel Advisor
      User Guide
If you need more information, continue your investigation by:
  1. Marking one or more loops/functions for deeper analysis in the column AND
  2. Running a Dependencies analysis to discover why the compiler assumed a dependency and did not vectorize a loop/function, and/or running a Memory Access Patterns (MAP) analysis to identify expensive memory instructions
Run Dependencies Analysis
To run a Dependencies analysis:
  1. Mark one or more un-vectorized loops for deeper analysis in the column in the
    Survey Report
    .
  2. Under
    Check Dependencies
    in the
    Vectorization Workflow
    , click the Intel Advisor control: Run analysis 
              control to collect Dependencies data while your application executes.
After the
Intel Advisor
collects the data, it displays a Dependencies-focused
Refinement Report
similar to the following:
Intel Advisor: Dependencies Report
There are many controls available to help you focus on the data most important to you, including the following:
1
To display more information in the
Dependencies Report
about a loop you selected for deeper analysis: Click the associated data row.
2
To display instruction addresses and code snippets for associated code locations in the
Code Locations
pane: Click a data row.
To choose a problem of interest to display in the
Dependencies Source
window: Right click a data row, then choose
View Source
.
To open your default editor in another tab/window: Right-click a data row, then choose
Edit Source
to open an editor tab.
3
To choose a code location of interest to display in the
Dependencies Source
window: Right-click a data row, then choose
View Source
.
To open your default editor in another tab/window: Right-click a data row, then choose
Edit Source
to open an editor tab.
4
Use the
Filter
pane to:
  • Temporarily limit the items displayed in the
    Problems and Messages
    pane by clicking filter criteria in one or more filter categories.
  • Deselect filter criteria in one filter category, or deselect filter criteria in all filter categories.
  • Sort all filter criteria by name in ascending alphabetical order or by count in descending numerical order. (You cannot change the order in which filter categories are presented.
5
To populate these columns and the
Memory Access Patterns Report
with data, run a Memory Access Patterns analysis.
If the
Dependencies Report
shows:
  • There is no real dependency in the loop for the given workload, follow
    Intel Advisor
    guidance to tell the compiler it is safe to vectorize.
  • There is an anti-dependency (often called a Write after read dependency or WAR), follow
    Intel Advisor
    guidance to enable vectorization.
Intel Advisor
code improvement guidance is available in the Intel Advisor control: Recommendations
Recommendations
tab and Next Steps: After Running Survey Analysis in the
Intel Advisor
User Guide. After you finish improving your code:
  1. Run a Memory Access Patterns (MAP) analysis if desired.
  2. Rebuild your modified code.
  3. Run another Roofline analysis to verify your application still runs correctly and all test cases pass, all loops are vectorizing properly, and performance is satisfactory.
Run Memory Access Patterns (MAP) Analysis
To run a Memory Access Patterns (MAP) analysis:
  1. Mark one or more un-vectorized loops for deeper analysis in the column in the
    Survey Report
    .
  2. Under
    Check Memory Access Patterns
    in the
    Vectorization Workflow
    , click the Intel Advisor control: Run analysis 
              control to collect MAP data while your application executes.
After the
Intel Advisor
collects the data, it displays a MAP-focused
Refinement Report
similar to the following:
Intel Advisor: Memory Access Patterns (MAP) Report
Intel Advisor
code improvement guidance is available in the Intel Advisor control: Recommendations
Recommendations
tab and Next Steps: After Running Survey Analysis in the
Intel Advisor
User Guide. After you finish improving your code:
  1. Rebuild your modified code.
  2. Run another Roofline analysis to verify your application still runs correctly and all test cases pass, all loops are vectorizing properly, and performance is satisfactory.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804