Use __itt_frame_ APIs from Intel VTune Amplifier XE 2011 to analyze each frame in critical code area

Frame Analysis is an Intel® VTune™ Amplifier XE 2011 functionality usually used by developers, who in addition to identifying hotspots in their code, they -for each frame- want to know more info such as elapsed time. Frame means critical (sensitive) code which is executed one time.
This functionality is usually used in video playing; for example - game developers have more interests in knowing time elapsed of each frame when using GDI or DirectX.

Here are steps for an example (the example program provides the graphics functionality by generating a fractal image using multiple threads configured
for interleaved scan line decomposition) using Frame Analysis:

1. Project settings of include file and lib file for libittnotify (A library from Intel® VTune™ Amplifier XE 2011)

Microsoft Windows*:
The default include file (full path name) - C:\Porgram Files\Intel\Amplifier XE\include\libittnotify.h --Please verify if this INCLUDE path is in your project/environment

itt_frame1.png 

The default library file (full path name) - C:\Porgram Files\Intel\Amplifier XE\lib\libittnotify.lib
-Please verify if this LIB path is in your project/environment. Note if the user have VTune™ Performance Analyzer installed, keep LIB path of VTune Amplifier to uppercase - since there is same library name from VTune Analyzer.
 
itt_frame2.png 

Linux*:
The default include file (full path name) - /opt/intel/vtune_amplifier_xe/include/libittnotify.h. The user can build application with argument "-I$path"
The default library file (full path name) - /opt/intel/vtune_amplifier_xe/lib/libittnotify.a. The user can build application with this object file

2. Find hot function(s)
Run Hotspot Analysis to find hot function named GenScanLine for frame analysis. Here are the steps how to find the hot function, GenScanLine:

• Select top 1 function to observe, GenColors consumes most of CPU time
• But GenColors is called by GenScanLine
• GenScanLine also is in top 2 functions that consume high CPU time

So will we modify (insert) code in GenScanLine? But wait...GenScanLine is so simple code and it is called in loop by PaintLine

itt_frame3.png 

 

3. Insert _itt_frame_ APIs code in loop, and add libittnotify.lib into project property
Insert code in function PaintLine, which calls function GenScanLine in loop

...

#include "ittnotify.h"

...

void PaintLine ( void* pMandelThreadParams, float realScale, float imagScale, float minreal, int pixelSizeInBytes )

{

    mandelThreadParams* p = (mandelThreadParams*)pMandelThreadParams;

 

    // Paint each line in mandelbrot display for the current thread.

    // Interleave the lines for each thread to get better thread balancing

    //     for (unsigned int y=startLine; y < endLine; y++) // Many threads unbalanced

 

    DWORD* pBitmap32 = p->pImageSpec->pBitmap;

    WORD*  pBitmap16 = (WORD *)p->pImageSpec->pBitmap;

 

    __itt_domain *my_itt_frame = __itt_domain_create("PaintLine");
    my_itt_frame->flags = 1; //enable it

    for (unsigned int y=p->threadNum; y < p->pImageSpec->height;

       y+=p->numThreads) // Many threads balanced

    {

       __itt_frame_begin_v3(my_itt_frame, NULL);   

       //Initialize GenColors

       GenColors ( 0, 0, 0, 0, 0 );

 

       GenScanLine( (p->pMandelSpec->min.imag + y*imagScale),

           (y*p->pImageSpec->pitch/pixelSizeInBytes),

           p->pImageSpec->width, p->threadNum,

           minreal, realScale );

 

       // Copy Line buffer to video card

       CopyBufferToVideo (&(pBitmap32 [y*p->pImageSpec->pitch/pixelSizeInBytes]),

           &(pBitmap16 [y*p->pImageSpec->pitch/pixelSizeInBytes]), p->pImageSpec->width, p->threadNum);

 

       // Update the screen while still calculating pixels

       if ((p->threadNum == 0) && (((y - p->threadNum) / p->numThreads) % 20) == 0) {

           Calculating = FALSE;

           DD_UnlockSurface((BYTE*)(p->pImageSpec->pBitmap));

           Paint (hWnd);

           Calculating = TRUE;

       }

 

      __itt_frame_end_v3(my_itt_frame, NULL);

    }

}

 

Add new library named libittnotify.lib then build project

itt_frame4.png 

 

4. Interpret results
Run Hotspot Analysis, then view result of Bottom-up, change view type to "/Frame Domain/Frames/Function/Call Stack".
Please note that there are many frame IDs in call stack, which contain performance data of functions. Thus, all performance data of functions are organized under different frames.


itt_frame5.png 


Use "/Frame Domain/Frame Type/Frames/Function/Call Stack" to know Fast frames and Slow frames. That means, based on performance data (frames per second) to organize / display results, as Slow or Fast!
Display all slow frames under "Slow", and all fast frames under "Fast"

itt_frame6.png 

 

Note: Fast frame threshold and Slow frame threshold are defined in Project Properties. The user can change default values as desired.itt_frame7.png 

 

Einzelheiten zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.