Intel® Inspector XE 2011: Controlling Analysis Cost
Find memory and threading errors faster
The Intel® Inspector XE 2011 detects challenging threading and memory errors and provides guidance to help ensure application reliability. When you run an analysis, the Intel Inspector XE 2011 executes the target against a data set that you specify. Data set size and workload have a direct impact on target execution time and analysis speed. In order to get a complete and accurate report within a reasonable timeframe, the target application needs to execute as many code paths as is practically possible, while minimizing the redundant computation within each task to the bare minimum needed for good code coverage.
It is usually possible to reduce workload or data set size to maximize the efficiency of Intel® Inspector XE 2011. For example:
• If the target application processes images then cutting the test image size when collecting correctness data can reduce the run time while not diminishing the completeness or accuracy of the reports.
• If the target application executes large loops, then it might be beneficial for controlling the speed of analysis to reduce loop trip-counts.
However, for some classes of applications it might be too difficult to reduce the workload which could lead to excessively slow analysis. This article discusses several approaches to reducing the analysis cost and making Intel Inspector XE 2011 work more efficiently on larger applications and data sets.
Note on compiler options:
This article describes several techniques for improving efficiency of Intel Inspector XE 2011 by changing the analysis configuration. However, compiler options that are used for building the target application might affect performance as well:
- In spite of the fact that debug build of an application should be used with Intel Inspector XE 2011 for more complete and accurate report, it will still work with the application compiled with some basic optimizations turned on. Intel Inspector XE 2011 might not be able to display precise source locations of the errors but this type of "express" analysis might give you an idea as to what parts of the target application you should be focusing on.
- Some popular compilers provide basic run-time memory checking functionality which slows down the execution of an application when it's enabled (it might be enabled by default for debug builds). For the purpose of analysis with Intel Inspector XE 2011 that functionality could be disabled (for example, read this article: Using the Microsoft* debug heap manager with memory error analysis of Intel® Parallel Inspector).
Stack Frame Depth
Intel® Inspector XE 2011 collects call stacks for each memory or threading error it finds. For example, each memory leak not only displays the size of the leaked block and module that caused the leak, but also the call path of the application at the time of the leak. Even if the application doesn't have any threading or memory errors, cost of call stacks may still be high: Intel Inspector XE 2011 needs to collect data speculatively for every observation, so that at some later point it can determine whether a set of observations actually document an error. Examining the call stacks is useful during the analysis of a problem and helps to implement an efficient solution. The convenience, of course, comes at a cost.
To reduce the cost of call-stacks, the user could reduce the depth of call stacks collected via the "Stack frame depth" setting in the target configuration (see Fig. 1).Fig. 1
The default stack frame depth for the most aggressive and resource consuming analysis types is 16. In many cases 8 or 1 might suffice. You could use an iterative approach starting at 1 and then increase stack frame depth for the reduced workload. However, expected performance improvement is not linear and for applications structured so that they don't execute deep call-stacks you might see little to no performance improvement.
Every serious application usually links with many shared libraries (DLL or SO). Intel® Inspector XE 2011 detects all the dynamic dependencies and performs the same level of analysis for them as for the actual executable. Many of the shared libraries that complex applications use are run-time libraries or other 3rd party libraries which application's developers don't have control over and don't have access to their source files (e.g. Microsoft* DirectX or OpenGL*). Excluding these libraries from the analysis and libraries that you think cannot be a source of memory or threading errors might greatly improve the speed of Intel Inspector XE 2011. Option for modifying the list of excluded modules is available at Project Properties page, Advanced options section (see Fig. 2).
Custom Analysis Configuration
Most of the memory and threading analysis types incorporate several types of analysis. It is possible to speed up the analysis by creating a custom analysis configuration and doing just the one analysis type you care about at the moment. When you need to switch to another analysis type just create new custom configuration.
There are two ways to create custom configurations: 1) new blank configuration or 2) snapshot of one of the existing configurations (see Fig. 3).
For example, you could create your own custom Memory Errors Analysis types by copying the preset configuration "Locate Memory Problems" and then reducing the scope of analysis by changing its settings (see Fig.4).
By default, "Locate Memory Problems" includes two types of analysis: Detect memory leaks and Detect resource leaks. You could turn them both off and enable "Detect invalid/uninitialized accesses" instead. You should leave "Duplicate elimination" checked and "Analyze stack accesses" unchecked. "Enhanced dangling pointer" and "Guard zones" don't impact speed much, but both of them cause a small amount of memory bloat accumulated over a large run can itself cause memory-related problems, so you should leave them unchecked as well. Note that you can also set the Stack frame depth in the custom analysis configuration (Fig. 4 shows an example of custom analysis type that only detects invalid/uninitialized accesses).
It is highly recommended that workloads and data sets should be smaller than production workloads for the purpose of correctness analysis with Intel® Inspector XE 2011. We discussed three techniques that might help speed up the analysis with Intel Inspector XE 2011 when reducing a workload is impossible. These techniques can be used independently or in combination.