by Corey Alsamariae
VTune™ Analyzer Product Support Engineer
Use the Intel® VTune™ Performance Analyzer to profile your Win32* executable files or Java* applications, and generate a call graph of active functions. Call Graph profiling allows you to analyze your Windows* or Java application, generate a call graph and identify critical functions and call sequences. It shows threads created, functions executed in memory and caller and callee functions of a specific function. The call graph profiling also generates a Call Graph spreadsheet and call list – all shown in the Call Graph View window.
Call Graph profiling reveals:
- The structure of your program on a function level;
- The number of times a function is called from a particular location;
- The time spent in each function and
- Functions on a critical path.
Each node (box) in the call graph (above) represents a function. Each edge (line with an arrow) connecting two nodes represents the call from the caller to the callee function. Call Graph profiling implements the following conventions to represent the computing activity:
- Nodes colored red designate functions that are on the critical path from the root (thread).
- The edge to the caller with the highest call time is colored in blue.
- The number next to the edge (line) indicates the number of calls to that function.
- To indicate which function or edge you are trying to get information about, the function changes its color to blue when you click on it, and the edge changes its color to red when you click on its arrow.
- If you select one or more functions with the selection tool, all selected nodes turn blue and edges turn red.
The window has a Call List tab in the bottom of the Call Graph view. The Call List view lists all the callers and the callees of the function selected in the spreadsheet (also referred to as the function in question) and displayed in the Call Graph view. In addition, the Call List has a View by Call Sites in which you can see call information represented by call sites: addresses from where the function was actually called.
Please note that with Microsoft Visual C++* and the Intel® C++ Compiler, the /fixed:no switch must be manually added to the linker options in the Project/Settings/Link/Project Options box. Make sure you are using this switch. This allows building base relocations required for instrumentation. Instrumentation for call graph profiling of Win32 applications is the process of modifying (a copy of) a program so that dynamic information is reco rded during program execution. Data collection routines that were invoked at specific points in the execution of the target program record run-time information. These routines provide information about time spent in each function, and the call sequence that led to a specific function. There are two ways to check for the base relocations in the .exe or .dll files:
- Using the Windows QuikView: Right-click the file and choose QuikView. Look at the Characteristics field in the Image File Header section. If the executable has no base relocations, you will find the message "Relocation info stripped from file" under the Characteristics field.
- Using dumpbin.exe (part of Microsoft Visual C++, located in the VC/Bin directory), run Dumpbin /headers <image_name>. Look at the Characteristics field in the Image File Header section. If the image has no base relocations, you will find the "Relocation info stripped from file" message.
If the function was instrumented, it should show up in the graph, regardless of whether it was called directly or by pointer. If it did not show up in the graph, then it is for some other reason. Please check the instrumentation level of the module. If it is "full instrumentation" then it should definitely show up in the graph; if it were "export" then it would show up only if it is exported.
For multithreaded applications, in the PRF file where the Call Graph data exists, the information is stored per thread and is broken by chain. Each thread has its own chain. However, there isn't any way to see this information in the VTune analyzer; the information is broken by the direct caller, and not more than that. The PRF file itself is a TEXT file, so one can open it and view the information, but the PRF file format is quite complex. There is a need for more processing in order to get meaningful numbers from it.
Some Known Issues and Limitations:
Issue 1: Call Graph fails during run time (when the instrumented application is running).
Solution 1: Selective instrumentation can fix Call Graph failures. Try isolating the modules(s) causing problems and turn off instrumentation on the modules.
- See VTune Performance Analyzer Online help.
- Under the 'Find' tab enter "DLLs".
- Select the "To specify the instrumentation and detail levels for Call Graph profiling Win32* applications" topic.
Issue 2: Instrumentation fails for a specific function.
Solution 2: Instrumentation could fail due to:
- The first basic block size in the function should be greater than or equal to five bytes.
- Decode failure, which might be caused by unrecognized instruction or decoding data, can cause this. Unsupported compilers or architecture could cause this. An old map file that does not match the exe might cause the decoding data.
- Try clearing the cache (using the Clear Cache button in the Win32 Call Graph Setup dialog box) and re-instrument.
Issue 3: Error: "Internal Error. Instrumentation is not possible"
- Only Microsoft Visual C++*/Intel® Compiler plug-in apps are supported with Call Graph.
- With Microsoft Visual C++/Intel Compiler plug-in, the /fixed:no switch must be manually added to the linker options in the Project/Settings/Link/Project Options box. Make sure you are using this switch. This allows building base relocations required for instrumentation.
- Call Graph might fail on applications that use Win32 Fibers. There is no workaround at present.
- To instrument an application that consists of more than one process (e.g. client-server), the other process should be added to the module list by using the Add Module button in the Win32 Call Graph Setup dialog box.
- If an executable contains a Shared Section, which it uses to communicate with another process, the other process must be instrumented (in order to force both processes to use the same DLL with the shared section).
- It is strongly recommended that you not use in-place instrumentation (same name, same directory).
- Transitioning from any place in the Source view to the Call Graph view for Java code will always bring you to the previously selected method.
- The VTune analyzer may get confused when there are .prf files in the working directory that are not created by the VTune analyzer itself. Renaming .prf files manually can result in a loss of consistency between Call Graph and sampling data.
- The function names of Intel C/C++ compiler-generated Intel® Pentium® III Processor 128-bit stack alignment functions with multiple entry points will appear at least twice in the Call Graph grid (with the same name) if the entry points are called during execution.
Intel offers additional state of the art software development products, including the Intel® Compilers and the Intel® Performance Libraries. Intel also provides online training and a Premier Support program – all designed to give you the best performance, capability, and support for Intel® processors. For VTune analyzer product support, please go to Intel Premier Support. To see the power of Intel® tools for yourself, visit the Intel web site for more information.