Recognize User Synchronization Objects in Intel® Parallel Amplifier

Many developers wrote their own primitives in their code, but Intel® Parallel Amplifier (Locks and Waits) recognizes Windows* defined synchronization objects only, like as event, mutex, semaphore and critical section, etc.

Intel® Parallel Amplifier provides libittnotify.dll/libittnotify.h which can notify User Synchronization Objects to Intel® Parallel Amplifier at runtime.

Here is a test case: the function is used by many threads, and each thread computes value in local variable "lpot" in loop, and accumulates local value into global variable "pot"

1)  for( i=start; i<end; i++ ) {

      for( j=0; j<i-1; j++ ) {

        distx = pow( (r[0][j] - r[0][i]), 2 );

        disty = pow( (r[1][j] - r[1][i]), 2 );

        distz = pow( (r[2][j] - r[2][i]), 2 );

        dist = sqrt( distx + disty + distz );     

       lpot += 1.0 / dist;

      }

   }

 

   EnterCriticalSection(&cs);

      pot += lpot;

   LeaveCriticalSection(&cs);

 

In this implementation, time comsumption of sys-obj "CRITICAL_SECTION cs;" will be analyzed in the result of Intel® Parallel Amplifer


2) for( i=start; i<end; i++ ) {

      for( j=0; j<i-1; j++ ) {

        distx = pow( (r[0][j] - r[0][i]), 2 );

        disty = pow( (r[1][j] - r[1][i]), 2 );

        distz = pow( (r[2][j] - r[2][i]), 2 );

        dist = sqrt( distx + disty + distz );     

        lpot += 1.0 / dist;

      }

   }

 

  while (!spin) {

          spin = 1;

          pot += lpot;

   }

   spin = 0;

 

In this implementation, time consumption of user's "spin" will NOT be analyzed

 

3)  for( i=start; i<end; i++ ) {

      for( j=0; j<i-1; j++ ) {

        distx = pow( (r[0][j] - r[0][i]), 2 );

        disty = pow( (r[1][j] - r[1][i]), 2 );

        distz = pow( (r[2][j] - r[2][i]), 2 );

        dist = sqrt( distx + disty + distz );     

        lpot += 1.0 / dist;

      }

   }

  

   sync_prepare(&spin);

   while (!spin) {

          spin = 1;

          sync_acquired(&spin);

          pot += lpot;

   }

   sync_releasing (&spin);

   spin = 0;

 

In this implementation, time consumption of user's "spin" will be analyzed

 

Note that the user can get libittnotify's APIs as below

 

#include <ittnotify.h>

......

typedef void (*itt_notify_sync_prepare)(void *);

typedef void (*itt_notify_sync_acquired)(void *);

typedef void (*itt_notify_releasing)(void *);

 

HMODULE hMod;

itt_notify_sync_prepare sync_prepare;

itt_notify_sync_acquired sync_acquired;

itt_notify_releasing sync_releasing;

......

  hMod = LoadLibrary("libittnotify.dll");

      

   sync_prepare = (itt_notify_sync_prepare) GetProcAddress(hMod, "__itt_notify_sync_prepare");

   sync_acquired = (itt_notify_sync_acquired) GetProcAddress(hMod, "__itt_notify_sync_acquired");

   sync_releasing = (itt_notify_releasing) GetProcAddress(hMod, "__itt_notify_sync_releasing");

 


Finally the user can get Intel® Parallel Amplifier (Locks and Waits) Result -

ittnotify.bmp

有关编译器优化的更完整信息,请参阅优化通知