Intel® Inspector XE 2013: Controlling Analysis Cost

Intel® Inspector XE 2013: Controlling Analysis Cost

Find memory and threading errors faster



Introduction
  

Intel® Inspector XE 2013 detects challenging threading and memory errors and provides guidance to help ensure application reliability. When you run an analysis, the Intel Inspector XE executes the target against a data set you specify. Data set size and workload have a direct impact on target execution time and analysis speed. To get a complete and accurate analysis within a reasonable timeframe, the target application needs to execute as many code paths as is practically possible, while minimizing the redundant computation within each task to the bare minimum needed for good code coverage.

It is usually possible to reduce workload or data set size to maximize the efficiency of the Intel Inspector XE. For example:

• If the target application processes images then cutting the test image size when collecting correctness data can reduce the run time while not diminishing the completeness or accuracy of the analysis.

• If the target application executes large loops, then it might be beneficial to control analysis speed by reducing loop trip-counts.

However, for some classes of applications it might be too difficult to reduce the workload which could lead to excessively slow analysis. This article discusses several approaches for reducing analysis cost and making the Intel Inspector XE work more efficiently on larger applications and data sets.


Note on compiler options:
This article describes several techniques for improving efficiency of the Intel Inspector XE by changing the analysis configuration. However, the compiler options that are used for building the target application might affect performance as well:

- Although the Intel Inspector XE provides more complete and accurate reports for debug builds, it still works on applications compiled with some basic optimizations enabled. Intel Inspector XE might not be able to display precise source locations for errors, but this type of express analysis might give you an idea as to what parts of the target application to focus on.
- Some popular compilers provide basic run-time memory checking functionality that slows down application execution (it might be enabled by default for debug builds). For the purpose of analysis with the Intel Inspector XE, that functionality could be disabled (for example, read this article: Using the Microsoft* debug heap manager with memory error analysis of Intel® Parallel Inspector).


Stack Frame Depth  

Intel Inspector XE collects call stacks for each memory or threading error it finds. For example, each memory leak not only displays the size of the leaked block and module that caused the leak, but also the call path of the application at the time of the leak. Even if the application doesn't have any threading or memory errors, cost of call stacks may still be high: The Intel Inspector XE needs to collect information for every observation, so that at some later point it can determine whether a set of observations actually document an error. Examining the call stacks is useful during the analysis of a problem and helps to implement an efficient solution. The convenience, of course, comes at a cost.

To reduce the cost of call-stacks, try reducing the depth of call stacks collected via the Stack frame depth setting in the analysis configuration (see Fig. 1).

Fig. 1



The default Stack frame depth for the most aggressive and resource-consuming analysis types is 16. In many cases 8 or 1 might suffice. You could use an iterative approach starting at 1 and then increase stack frame depth for the reduced workload. However, expected performance improvement is not linear and you might see little to no performance improvement for applications that don't execute deep call stacks.

 

Exclude Modules  

Every serious application usually links with many shared libraries (DLL or SO). Intel Inspector XE detects all the dynamic dependencies and performs the same level of analysis for them as for the actual executable. Many of the shared libraries that complex applications use are run-time libraries or other 3rd party libraries (e.g. Microsoft DirectX* or Silicon Graphics Inc. OpenGL* libraries). You may  have no control over or access to these source files. Excluding these libraries – and libraries that you think cannot be a source of memory or threading errors – might greatly improve the speed of the Intel Inspector XE. See the Advanced section of the Project Properties page to exclude modules (see Fig. 2).

Fig. 2


 

Custom Analysis Configuration

Most of the preset memory and threading analysis configurations incorporate several types of analysis. It is possible to speed up the analysis by creating a custom analysis configuration and doing just the one analysis type you care about at the moment. When you need to switch to another analysis type, just create a new custom configuration.

There are two ways to create custom configurations:

  • Create a custom configuration by editing a copy of a preset configuration 
  • Edit an existing custom configuration.

To edit a copy of a preset configuration, use the Copy button (see Fig. 3).

Fig. 3

 

Change the settings to reduce the scope of analysis. Hovering over any particular setting displays information about what that option does and how expensive it is (see  Fig.4).

Fig. 4 


 

By default, Locate Memory Problems includes two kinds  of analysis: Detect memory leaks and Detect resource leaks. You could disable both and enable Detect invalid/uninitialized accesses instead. Leave Duplicate elimination enabled and Analyze stack accesses disabled. Enhanced dangling pointer and Guard zones don't impact speed much, but both of them cause a small amount of memory bloat. The combination, accumulated over a large run, can itself cause memory-related problems, so you should leave them disabled as well. Note that you can also set the Stack frame depth in the custom analysis configuration.

Previously created custom configurations are available for use via the Custom Analysis Types tab on the Configure Analysis Type window.(see  Fig.5).

Fig. 5

 

API Control of Collection

Intel Inspector XE also provides Collection Control APIs you can use in the code to turn analysis on and off on a per-thread or per-object basis.

To tell the Intel Inspector XE to stop/restart analyzing for errors on the current thread, use:

·        __itt_suppress_memory_errors to stop analyzing for memory errors

·        __itt_suppress_threading_errors to stop analyzing for threading errors

·        __itt_suppress_memory_errors|_itt_suppress_threading_errors to stop analyzing for memory or threading errors

·        __itt_suppress_pop to undo the most recent matching push call

 

Please note that the Intel Inspector XE still generates race diagnostics based on the time of the conflicting access. So it is possible to suppress a thread but have diagnostics based on conflicts that started before the suppression was initiated.

To tell the Intel Inspector XE to stop/restart analyzing for errors on a given address range, use __itt_suppress_range or __itt_unsuppress_range with the following modes:

·        __itt_suppress_memory_errors to stop analyzing for memory errors

·        __itt_suppress_threading_errors to stop analyzing for threading errors

·        __itt_suppress_memory_errors|__itt_suppress_threading_errors to stop analyzing for memory or threading errors

 

Detailed information and examples of how to use these APIs with both C/C++ code and Fortran code can be found in the product documentation.

Conclusion

We discussed four techniques that might help speed up the analysis with the Intel Inspector XE even when reducing a workload or data set is impossible. These techniques can be used independently or in combination.

For more complete information about compiler optimizations, see our Optimization Notice.