Intel® Inspector for systems: Tips and Tricks for improving Time Space factors

I would like to mention some of the tips and tricks that can be used in “Inspector for systems” to improve the Time and Space factors. You can refer to the product help menu for more detailed information on these and other relevant features thereof.

  1. Using Intel® Compiler Code Coverage feature

 

  • Initially we can use the  Intel® Compiler Code Coverage feature to find out the sections of the code that are not being used in our project using the compiler options and then suppress the Profiling , Memory and Threading error analysis for those sections of the code which are not being covered as reported by the Code Coverage run.

 

Please refer to this article on how to use the Intel® Compiler Code Coverage feature: - http://software.intel.com/en-us/articles/how-to-use-intel-compiler-code-coverage-to-determine-the-code-exercised .

 

Once we have analyzed the code that is not being covered, then we may use several user API’s that are been provided for both Intel ® Inspector for Systems.

 

  1. Using user Application Procedural Interfaces (API) for collection control :-

 

  • Intel® Inspector for Systems: - Before getting into the details of various user APIs, let’s try to understand the advantages of using these APIs per-se.

 

As we know that the Inspector for systems tool uses the instrumentation methodology to analyze the Memory & Threading issues and therefore has some significant overhead. In order to reduce the overall inspect time of your application, we can use these Collection control APIs to reduce the inspect time by wrapping up the code snippet with user APIs that you wish to inspect.

 

Inspector has 2 different varieties of user API:-

  1. Time-oriented APIs to control which time periods of thread execution to analyze/not analyze

We have 2 such API’s to do the collection control in Inspector:-

 

  • void __itt_suppress_push(unsigned int etype)

Tell the Intel® Inspector to stop analyzing for errors on the current thread.

etype is :-

__itt_suppress_memory_errors to stop analyzing for memory errors

 

__itt_suppress_threading_errors to stop analyzing for threading errors

 

__itt_suppress_memory_errors| _itt_suppress_threading_errors to stop analyzing for memory or threading errors

  • void __itt_suppress_pop (void)

 

Tells the Intel® Inspector to undo the action corresponding to the most recent matching nested push call.

 

Push calls nest and are additive, so the Intel Inspector does not resume analyzing for:

 

  1. Memory errors until there are an equal number of pop calls corresponding to push calls for memory errors

 

  1. Threading errors until there are an equal number of pop calls corresponding to push calls for threading errors

 

  1. Class-oriented APIs to control which data objects to analyze/not analyze

 

  • void __itt_suppress_mark_range(__itt_suppress_mode_t mode,   unsigned int etype,void * address, size_t size);

 

Tells the Intel® Inspector to mark the memory range defined by address and size with the flags mode and etype.

 

mode is __itt_suppress_range or __itt_unsuppress_range.

 

etype is:

 

__itt_suppress_memory_errors to stop analyzing for memory errors

 

__itt_suppress_threading_errors to stop analyzing for threading errors

 

__itt_suppress_memory_errors | __itt_suppress_threading_errors to stop analyzing for memory or threading errors

 

 

addr is the address of the first byte to suppress.

 

size is the number of bytes to suppress.

 

  • void __itt_suppress_clear_range( __itt_suppress_mode_t mode, unsigned int etype, void * address, size_t size);

 

Tell the Intel® Inspector to clear the previously marked range defined by address and size with the flags mode and etype.

 

mode is __itt_suppress_range or __itt_unsuppress_range.

 

etype is:

 

__itt_suppress_memory_errors to stop analyzing for memory errors

 

__itt_suppress_threading_errors to stop analyzing for threading errors

 

__itt_suppress_memory_errors | __itt_suppress_threading_errors to stop analyzing for memory or threading errors

Here are some small code snippets on how to use these collection control API’s:-

  1. Time-oriented Collection Control

#include <ittnotify.h>

 

...

#pragma omp parallel

    __itt_suppress_push(__itt_suppress_threading_errors);

    /* Any threading errors here will be ignored by the calling thread.

       In this case, each thread in the region */

 

    __itt_suppress_pop();

    /* Any threading errors here will be

       seen by Inspector*/

}

 

  1. Object-oriented Collection Control

 

#include <ittnotify.h>
 
int variable_to_watch;
int other_variable;
     //change the default mode by using NULL and 0 as address and size
    __itt_suppress_mark_range(__itt_suppress_range,__itt_suppress_threading_errors,NULL,0);
     //ensure we see errors on variable_to_watch
    __itt_suppress_mark_range(__itt_unsuppress_range,__itt_suppress_threading_errors,&variable_to_watch,sizeof(variable_to_watch));
 
#pragma omp parallel
     …
     variable_to_watch++; //race will be reported
     other_variable++; //race will not be reported
}
     //clear the record for all of memory
    __itt_suppress_clear_range(__itt_suppress_range,__itt_suppress_threading_errors,NULL,0);
    
//clear the record for variable_to_watch
    __itt_suppress_clear_range(__itt_unsuppress_range,__itt_suppress_threading_errors,&variable_to_watch,sizeof(variable_to_watch));
    
//mark the range for other_variable so we don’t see errors
    __itt_suppress_mark_range(__itt_suppress_range,__itt_suppress_threading_errors,&other_variable,sizeof(other_variable));
 
#pragma omp parallel
     …
     variable_to_watch++; //race will be reported
     other_variable++; //race not be reported

 

Note: The example code snippet used in this article was just to demonstrate the usage and is not the complete code. Using these Collection control APIs we can in fact reduce the time of the analyzers by eliminating the analysis time for unused/uncovered code.  I will address the collection control APIs available in Intel® VTune Amplifier for systems in separate article.

 

For more complete information about compiler optimizations, see our Optimization Notice.