Recognize User Synchronization Objects in Intel(R) Thread Profiler

Submit New Article

Last Modified On :   October 10, 2009 2:00 AM PDT
Rate
 


Intel(R) Thread Profiler identifies the thread and synchronization objects which impact performance. In most of cases, the user will adopt Windows* defined synchronization objects, like as event, mutex, semaphore and critical section, etc.

However sometime the user will adopt themselves defined synchronization objects. Intel® Parallel Amplifier provides libittnotify.dll/libittnotify.h which can notify User Synchronization Objects to Intel® Thread Profiler at runtime. See below examples,

Here is a test case: the function is used by many threads, and each thread computes value in local variable "lpot" in loop, and accumulates local value into global variable "pot"

1)  for( i=start; i<end; i++ ) {

      for( j=0; j<i-1; j++ ) {

        distx = pow( (r[0][j] - r[0][i]), 2 );

        disty = pow( (r[1][j] - r[1][i]), 2 );

        distz = pow( (r[2][j] - r[2][i]), 2 );

        dist = sqrt( distx + disty + distz );     

       lpot += 1.0 / dist;

      }

   }

 

   EnterCriticalSection(&cs);

      pot += lpot;

   LeaveCriticalSection(&cs);

 

In this implementation, time comsumption of sys-obj "CRITICAL_SECTION cs;" will be analyzed in the result of Intel® Thread Profiler

TP_CS.bmp

2) for( i=start; i<end; i++ ) {

      for( j=0; j<i-1; j++ ) {

        distx = pow( (r[0][j] - r[0][i]), 2 );

        disty = pow( (r[1][j] - r[1][i]), 2 );

        distz = pow( (r[2][j] - r[2][i]), 2 );

        dist = sqrt( distx + disty + distz );     

        lpot += 1.0 / dist;

      }

   }

 

  while (!spin) {

          spin = 1;

          pot += lpot;

   }

   spin = 0;

In this implementation, time consumption of user's "spin" will NOT be analyzed

TP_NONE.bmp 

3)  for( i=start; i<end; i++ ) {

      for( j=0; j<i-1; j++ ) {

        distx = pow( (r[0][j] - r[0][i]), 2 );

        disty = pow( (r[1][j] - r[1][i]), 2 );

        distz = pow( (r[2][j] - r[2][i]), 2 );

        dist = sqrt( distx + disty + distz );     

        lpot += 1.0 / dist;

      }

   }

  

   sync_prepare(&spin);

   while (!spin) {

          spin = 1;

          sync_acquired(&spin);

          pot += lpot;

   }

   sync_releasing (&spin);

   spin = 0;

 

In this implementation, time consumption of user's "spin" will be analyzed again.

TP_US.bmp 

Note that the user can get libittnotify's APIs as below

 

#include <ittnotify.h>

......

typedef void (*itt_notify_sync_prepare)(void *);

typedef void (*itt_notify_sync_acquired)(void *);

typedef void (*itt_notify_releasing)(void *);

 

HMODULE hMod;

itt_notify_sync_prepare sync_prepare;

itt_notify_sync_acquired sync_acquired;

itt_notify_releasing sync_releasing;

......
hMod = LoadLibrary("libittnotify.dll");

      

sync_prepare = (itt_notify_sync_prepare) GetProcAddress(hMod, "__itt_notify_sync_prepare");

sync_acquired = (itt_notify_sync_acquired) GetProcAddress(hMod, "__itt_notify_sync_acquired");

sync_releasing = (itt_notify_releasing) GetProcAddress(hMod, "__itt_notify_sync_releasing");





This article applies to: Intel® Thread Profiler for Windows* Knowledge Base