Managing overhead of Intel® Advisor analyses

Автор Igor Vorobtsov,

Опубликовано:10/24/2017   Последнее обновление:12/05/2017

There are many options to manage Intel® Advisor analyses overhead depending on code of your application and data you want to collect. Note that most of these methods are based on minimizing the amount of data collected.  It is also important to avoid usage of “slow file system”, e.g. slow remote shared file system, for Intel Advisor project directory.

Collection overhead

Stitch stacks and Stackwalk mode

Survey Hotspots analysis overhead shouldn’t be more than 10% in most cases. However, it depends on the application code and overhead may be significant in some cases. Note that Intel Advisor has 2 options enabled by default for Survey Hotspots analysis. One is a Stitch stacks option to restore a logical call tree for Intel® TBB or OpenMP* applications by catching notifications from the runtime and attaching stacks to a point introducing a parallel workload. Other one is a Stackwalk mode which chooses between online (during collection) and offline (after collection) modes to analyze stacks (default is offline).

In most cases, the default settings are optimal. However, if you are experiencing Survey overhead more than 10%, you may want to disable stack stitching and change stackwalk mode to online .  Note that disabling stack stitching may decrease the overhead for applications using Intel TBB. Online stackwalk mode is useful in cases when a lot of data is allocated on stack, e.g. it is a common case for Fortran applications or applications with large numbers of small parallel OpenMP regions. For these cases, it is recommended to change the default settings.

Please use following options in CLI: -no-stack-stitching -stackwalk-mode=online

You may also change these settings in GUI (Project Properties -> Survey Hotspots Analysis -> Advanced):

Survey Hotspots Analysis Advanced settings

Uncheck Stich stacks box and set Stack unwinding mode to minimize the collection overhead.

CLI example: advixe-cl -collect survey -stackwalk-mode=online -no-stack-stitching -- ./my_app.exe

Modules filter

There is a setting to include only the listed application modules or exclude specific modules from collection. Excluding unnecessary modules decreases  collection and finalization time for Trip Counts, FLOP and other analyses. You can set it using Project Properties in GUI:

Modules filter

CLI options to use filtering feature are -module-filter and -module-filter-mode, e.g. -module-filter=C:\test\test.dll will exclude test.dll from collection (default mode is exclude). Adding -module-filter-mode=include to command line will include only test.dll for collection.

CLI example:  advixe-cl -collect [survey|tripcounts] -module-filter-mode=include -module-filter=foo1.so -module-filter=foo2.so

Minimize data set

Intel Advisor offers two refinement analyses: Check Dependencies and Check Memory Access Patterns. These types of analyses are expensive and consume much more resources than Survey.  A program may take 50 to hundreds of times longer to run than it does normally. For example, if you run your program with an input data set that would normally take 25 minutes to process, the Dependencies tool may take a day or more to run your program. The Dependencies tool only collects data as it executes within the selected loop or site. To minimize increased program run times, choose input data sets that minimize the number of instructions executed within a loop while thoroughly exercising your program’s control flow paths.

Using the Stop button to minimize data collection

There is an option to stop current analysis in the Workflow tab and display already collected data while doing Dependencies or Memory Access Patterns analysis:

Stop Current Analysis

You can manually stop collection once all loops are analyzed at least once.

In CLI use the --stop command to interrupt the collection, retain and finalize the collected data.

Site Coverage widget is a new GUI feature available for Dependencies and Memory Access Patterns analyses. It is a progress bar showing once all marked loops are analyzed at least once.  In most cases, you can stop collection after target loop appears in the list:

Site coverage

There is also an option to stop the collection after specified time limit in seconds for Dependencies analysis (new feature in 2018 Update 1). You may set it in GUI (Project Properties -> Dependencies Analysis -> Advanced):

Set the duration of data collection

In CLI use option -stop-after=<sec>.

Run application without analysis and enable it later with resume command

There is an option to run the application without analysis and enable it later using the resume command:

Run Application without Analysis and then Enable Analysis Later

You may resume paused analysis or stop data collection to skip uninteresting parts of the target program's execution. This minimizes the data collected, speeds up the analysis of applications, and minimizes execution time. Note that new Intel Advisor 2018 Update 1 features (collection control APIs and loop selection for Trip Counts and FLOP and Roofline analysis) giving you better control over the data collection process.

New in 2018 Update 1: mark loops for Trip Counts and FLOP and Roofline analysis

With Intel Advisor 2018 Update 1 you may now mark loops using checkboxes column in the Survey & Roofline tab and find trip counts and FLOP for these selected loops.

Mark Loops for Deeper Analysis

Note that by default all loops are analyzed and it increases an overhead. In CLI, you may use the option -mark-up-list to select the loops you want to inspect.

New in 2018 Update 1:  collection control APIs for Trip Counts and FLOP analysis

There is now an option to use the instrumentation and tracing technology (ITT) APIs in your code to control a way the Intel Advisor collects data for applications and minimize overhead for Trip Counts and FLOP analysis.  Note that for Survey Hotspots analysis these collection control APIs available with older versions of Intel Advisor. There are two primitives available for all analysis types.

__itt_pause runs the application without collecting data. Intel Advisor reduces the overhead by collecting only critical information.
__itt_resume resumes collecting all data.
You are not recommended to call Pause/Resume API on frequent basis for small workloads.

Before instrumenting your application, you need to configure your build system to be able to reach the API headers and libraries:

  • Add <install_dir>/include to your INCLUDE path for C/C++ applications or <install_dir>/include/intel64 or ia32 to your INCLUDE path for Fortran applications
  • Add <install_dir>/lib32 to your 32-bit LIBRARIES path
  • Add <install_dir>/lib64 to your 64-bit LIBRARIES path

where <install_dir> is the Intel® Advisor installation directory. In Visual Studio environment, you may add this in Project->Properties->VC++ Directories (‘Include directories’ and ‘Library directories’).

You also need to link the static library, libittnotify.a (Linux*) or libittnotify.lib (Windows*), to your application. If tracing is enabled, this static library loads the ITT API implementation and forwards ITT API instrumentation data to Intel Advisor.In Visual Studio environment, you may add this in Project->Properties->Linker->Input->Additional Dependencies.

#include <ittnotify.h>
int main(int argc, char* argv[])
{
  // Do work here
  __itt_pause();
  // Do uninteresting work here
  __itt_resume();
  // Do work here
  __itt_pause();
  // Do uninteresting work here
  return 0;
}

ITT APIs are also available for Fortran code. Please add ‘USE ITTNOTIFY’ statement to call IIT_PAUSE() and ITT_RESUME() subroutines within your program.

Please find more details on the instrumentation and tracing technology (ITT) APIs in the documentation, e.g. on Configuring Your Build System and collection control APIs.

Pause Collection and Resume Collection Annotations

There is also an option to control the data collection using annotations. Pause Collection and Resume Collection annotations let you stop and resume data collection to skip uninteresting parts of the target program's execution. These are some kind of wrappers for ITT APIs with additional features.
If you pause data collection, the target executable continues to execute until you resume data collection. Pausing data collection minimizes the amount of data collected and speeds up the analysis of large applications.

Pause Collection annotation completely stops the analysis of your program until the matching Resume Collection (disable-collection-pop) annotation is executed.
The syntax is:

C++:  ANNOTATE_DISABLE_COLLECTION_PUSH;
Fortran: call annotate_disable_collection_push()

Resume Collection annotation resumes the analysis previously stopped by a Pause Collection (disable-collection-push) annotation. The syntax is:

C/C++:   ANNOTATE_DISABLE_COLLECTION_POP;
Fortran:  call annotate_disable_collection_pop()

Please refer to the documentation for more details on annotations.

Note that for Dependencies or Memory Access Patterns analysis ANNOTATE_DISABLE_COLLECTION_{PUSH,POP} directives are not honored.  Please use ANNOTATE_{SITE,TASK,ITERATION_TASK} ,   if you want to select small number of loops of interest by modifying (annotating) source code.

Disable stacks collection for Trip Counts and FLOP analysis

Note that stack collection for Trip Counts and FLOP analysis is disabled by default. If it is enabled and overhead is too high, you may disable stacks collection (Project Properties -> Trip Counts and FLOP analysis):

Uncheck Collect stacks option

In CLI check that there is no -stacks option to avoid stacks collection. Note that stacks collection is also enabled if you run Roofline analysis with ‘Enable Roofline with Callstacks’ checkbox.

Finalization overhead

Modules filter

Same modules filter feature will be useful for decreasing finalization overhead of Survey Hotspots analysis. You may set it in Project Properties in GUI or use CLI option -module-filter.

Disable additional analysis

Analyze loops that reside in non-executed code paths may increase finalization overhead. The same is true for Analyze Python, MKL, Enable register spill/fill analysis and Enable static instruction mix analysis on general project properties page and in Advanced section. Make sure you are not doing additional analysis increasing overhead if this data is not interesting for you.

Collection without finalization

Finalization of collected data on slow targets, e.g. Intel® Xeon Phi™ processors, may take long time. There is an option to collect the data without finalization and then finalize it on different host, e.g. Xeon processor. You can specify -no-auto-finalize option in CLI to turn off automatic finalization during collection. Then you may pack (optional) results and copy to the host. Results will finalize when you open them in the GUI on host.

Please find this article for more details on how to collect with no finalization on Intel Xeon Phi processors.

CLI example: advixe-cl -collect survey -no-auto-finalize ./my_app.exe

Информация о продукте и производительности

1

Производительность зависит от вида использования, конфигурации и других факторов. Дополнительная информация — по ссылке: www.Intel.com/PerformanceIndex.