Suppose you have an application that makes heavy use of libraries or software components which might be developed independently of the application itself. As an application developer the relevant part of the trace are the events inside the application and the top-level calls into the libraries made by the application, but not events inside the libraries. As a library developer the interesting part of a trace are the events inside one's library and how the library functions were called by the application.
Here you can see the calling dependencies in a hypothetical application (it is the application developer's view on improving performance):
lib1, lib2, lib4 are called by the application; the application developer codes these calls and can change the sequence and parameters to them to improve performance (arrows marked as 1)
lib3 is never directly called by the application. The application developer has no way to tailor the use of lib3. These calls (arrows marked as 3) are therefore of no interest to him, and detailed performance data is not necessary.
lib4 is called both directly by the application, and indirectly through lib2. Only the direct use of lib4 can be influenced by the application developer, and the information about the indirect calls (arrows marked 4) are not interesting to it.
For the library developer, the performance analysis model is significantly different. Here, the workings of the application are of no concern apart perhaps from call paths that lead into the library. The library developer will need detailed information about the workings of say lib2, including the calls from the application, and the calls to component libraries (lib3 and lib4), and to system-level services (MPI). The library developer of lib2 will have no interest in performance data for lib1, and similarly the library developers of lib1 will have no interest in data from lib2, lib3, and lib4.
If the application and the involved libraries are instrumented to log function calls (either manually or with a compiler), then Intel® Trace Collector supports tracing of the application in a way that just the interesting data is recorded. This is done by writing a filter rule that turns off tracing once a certain function entry has been logged and turns it on again when the same function is left again. This effectively hides all events inside the function. In analogy to the same operation in a graphical tree view this is called FOLDING in Intel® Trace Collector. UNFOLDING is the corresponding operation that resumes tracing again in a section that otherwise would have been hidden. In contrast to turning tracing on and off with the API calls VT_traceon() and VT_traceoff(), folding does not log a pseudo-call to VT_API:TRACEOFF. Otherwise folding a function that does not call any other function would log more, not less data. It is also not necessary to turn tracing on again explicitly, this is done automatically.
Folding is specified with the STATE, SYMBOL or ACTIVITY configuration options. Shell wildcards are used to select functions by matching against their name (SYMBOL), class (ACTIVITY) or both (STATE). FOLD and UNFOLD are keywords that trigger folding or unfolding when a matching function is entered. With the CALLER keyword one can specify as an additional criteria that the calling function match a pattern before either folding or unfolding is executed. How to Use the Filtering Facility has a detailed description of the syntax.
In this section folding is illustrated by giving configurations that apply to the example given above. A C program is provided in examples/libraries.c that contains instrumentation calls that log a calltree as it might occur from a program run with library dependencies as in 3.1. Here is an example of call tree for the complete trace (calls were aggregated and sorted by name, therefore the order is not sequential):
\->User_Code +->finalize | \->lib2_end +->init | +->lib1_fini | \->lib1_main | +->close | +->lib1_util | +->open | \->read +->lib4_log | \->write \->work +->lib2_setup | +->lib3_get | | \->read | \->lib4_log | \->write \->lib4_log \->write
By using the configuration options listed below, different parties can run the same instrumented executable to get different traces:
Application Developer: trace the application with only the top-level calls in lib1, lib2, and lib4
STATE lib*:* FOLD
\->User_Code +->finalize | \->lib2_end +->init | +->lib1_fini | \->lib1_main +->lib4_log \->work +->lib2_setup \->lib4_log
lib2 Developer: trace everything in lib2, plus just the top-level calls it makes
STATE *:* FOLD
STATE lib2:* UNFOLD
\->User_Code +->finalize | \->lib2_end \->work \->lib2_setup +->lib3_get \->lib4_log
lib2 Developer, detailed view: trace the top-level calls to lib2 and all lib2, lib3, lib4, and system services invoked by them
STATE Application:* FOLD
STATE lib2:* UNFOLD
\->User_Code +->finalize | \->lib2_end \->work \->lib2_setup +->lib3_get | \->read \->lib4_log \->write
Application and lib4 Developers: trace just the calls in lib4 issued by the application
STATE *:* FOLD
STATE lib4:* UNFOLD CALLER Application:*
\->User_Code +->lib4_log | \->write \->work \->lib4_log \->write
It is assumed that application, libraries and system calls are instrumented so that their classes are different. Alternatively you could match against a function name prefix that is shared by all library calls in the same library.