Intel® Inspector: Controlling Analysis Cost

Introduction  

Intel® Inspector detects challenging threading and memory errors and provides guidance to help ensure application reliability. When you run an analysis, Intel Inspector executes the target against a data set you specify. Data set size and workload have a direct impact on target execution time and analysis speed. To get a complete and accurate analysis within a reasonable timeframe, the target application needs to execute as many code paths as is practically possible, while minimizing the redundant computation within each task to the bare minimum needed for good code coverage.

It is usually possible to reduce workload or data set size to maximize the efficiency of Intel Inspector. For example:

  • If the target application processes images then cutting the test image size when collecting correctness data can reduce the run time while not diminishing the completeness or accuracy of the analysis.
  • If the target application executes large loops, then it might be beneficial to control analysis speed by reducing loop trip-counts.

However, for some classes of applications it might be too difficult to reduce the workload which could lead to excessively slow analysis. This article discusses several approaches for reducing analysis cost and making Intel Inspector work more efficiently on larger applications and data sets.

Note on compiler options: This article describes several techniques for improving efficiency of Intel Inspector by changing the analysis configuration. However, the compiler options that are used for building the target application might affect performance as well.
Some popular compilers provide basic run-time memory checking functionality that slows down application execution (it might be enabled by default for debug builds). For the purpose of analysis with the Intel Inspector, that functionality could be disabled.

Stack Frame Depth  

Intel Inspector collects call stacks for each memory or threading error it finds. For example, each memory leak not only displays the size of the leaked block and module that caused the leak, but also the call path of the application at the time of the leak. Even if the application doesn't have any threading or memory errors, the cost of call stacks may still be high; Intel Inspector needs to collect information for every observation, so that at some later point it can determine whether a set of observations actually documents an error. Examining the call stacks is useful during the analysis of a problem and helps to implement an efficient solution. The convenience, of course, comes at a cost.

To reduce the cost of call-stacks, try reducing the depth of call stacks collected via the Stack frame depth setting in the analysis configuration.

The default Stack frame depth for the most aggressive and resource-consuming analysis types is 16. In many cases 8 or 1 might suffice. You could use an iterative approach starting at 1 and then increase stack frame depth for the reduced workload. However, expected performance improvement is not linear and you might see little to no performance improvement for applications that don't execute deep call stacks.

Exclude Modules  

Every serious application usually links with many shared libraries (DLL or SO). Intel Inspector detects all the dynamic dependencies and performs the same level of analysis for them as for the actual executable. Many of the shared libraries that complex applications use are run-time libraries or other 3rd party libraries (e.g. Microsoft DirectX* or Silicon Graphics Inc. OpenGL* libraries). You may have no control over or access to these source files. Excluding these libraries – and libraries that you think cannot be a source of memory or threading errors – might greatly improve the speed of Intel Inspector. You can exclude modules in the Advanced section of the Project Properties.

Custom Analysis Configuration

Most of the preset memory and threading analysis configurations incorporate several types of data collection. It is possible to speed up the analysis by creating a custom analysis configuration and collecting just the data you care about at the moment. When you need to switch to another analysis type, just create a new custom configuration.

There are two ways to create custom configurations:

  • Create a custom configuration by creating a modified copy of a preset analysis configuration
  • Edit an existing custom configuration from the Custom Analysis Types page

Change the settings to reduce the scope of analysis. Hovering over any particular setting displays information about what that option does and how expensive it is.

By default, Locate Memory Problems includes two kinds of data collection: Detect memory leaks and Detect resource leaks. You could disable both and enable Detect invalid/uninitialized accesses instead. Leave Eliminate duplicates enabled and Analyze stack accesses disabled. Enhanced dangling pointer and Guard zones don't impact speed much, but both of them cause a small amount of memory bloat. The combination, accumulated over a large run, can itself cause memory-related problems, so you should leave them disabled as well. Note that you can also set the Stack frame depth in the custom analysis configuration.

Previously created custom configurations are available for use via the Custom Analysis Types page of the Configure Analysis Type window.

API Control of Collection

Intel Inspector also provides Collection Control APIs you can use in the code to turn analysis on and off on a per-thread or per-object basis. To tell Intel Inspector to stop/restart analyzing for errors on the current thread, use:

  • __itt_suppress_memory_errors to stop analyzing for memory errors
  • __itt_suppress_threading_errors to stop analyzing for threading errors
  • __itt_suppress_memory_errors|_itt_suppress_threading_errors to stop analyzing for memory or threading errors
  • __itt_suppress_pop to undo the most recent matching push call

Please note that Intel Inspector still generates race diagnostics based on the time of the conflicting access. So it is possible to suppress a thread but have diagnostics based on conflicts that started before the suppression was initiated.

To tell Intel Inspector to stop/restart analyzing for errors on a given address range, use __itt_suppress_range or __itt_unsuppress_range with the following modes:

  • __itt_suppress_memory_errors to stop analyzing for memory errors
  • __itt_suppress_threading_errors to stop analyzing for threading errors
  • __itt_suppress_memory_errors|__itt_suppress_threading_errors to stop analyzing for memory or threading errors

Detailed information and examples of how to use these APIs with both C/C++ code and Fortran code can be found in the product documentation.

Conclusion

We discussed four techniques that might help speed up the analysis with Intel Inspector even when reducing a workload or data set is impossible. These techniques can be used independently or in combination.

For more complete information about compiler optimizations, see our Optimization Notice.