User Guide

  • 2020
  • 10/21/2020
  • Public Content
Contents

APIs for Custom Synchronization

While the
Intel Inspector
supports a significant portion of the Windows* OS and POSIX* APIs, it is often useful to define your own synchronization constructs. Any specially built constructs that you create are not normally tracked by the
Intel Inspector
; however, the
Intel Inspector
supports synchronization APIs to help you gather semantic information related to your custom synchronization constructs.
Synchronization constructs may generally be modeled as a series of signals. One thread, or many threads, may wait for a signal from another group of threads before proceeding with some action. Synchronization APIs track when a thread begins waiting for a signal and when the signal occurs.

Using User-Defined Synchronization APIs in Your Code

Use This in C/C++ Code
Use This in Fortran Code
To Do This
void __itt_sync_acquired ( void *addr)
subroutine itt_sync_acquired(addr) integer(kind=itt_ptr), intent(in), value :: addr end subroutine itt_sync_acquired
Tell the
Intel Inspector
that the code received a signal on the specified synchronization object.
void __itt_sync_releasing ( void *addr)
subroutine itt_sync_releasing(addr) integer(kind=itt_ptr), intent(in), value :: addr end subroutine itt_sync_releasing
Tell the
Intel Inspector
that the code is about to send a signal on the specified synchronization object.
void __itt_sync_destroy ( void *addr)
subroutine itt_sync_destroy(addr) integer(kind=itt_ptr), intent(in), value :: addr end subroutine itt_sync_destroy
Tell the
Intel Inspector
that the synchronization object will not be used again, so the
Intel Inspector
can dispose of bookkeeping information associated with this object.
The
addr
parameter is simply a value that uniquely identifies the synchronization object to be modeled. Unique values allow the
Intel Inspector
to track distinct custom synchronization objects. To use the same custom object to protect access in different parts of your code, use the same
addr
parameter around each.
Since each custom synchronization construct may involve any number of synchronization objects, each synchronization object must be triggered off a unique memory handle, which the synchronization APIs will use to track the object. You can track any number of synchronization objects at one time using synchronization APIs, as long as each object uses a unique memory pointer. You can think of this as modeling objects similar to the
WaitForMultipleObjects
function in the Windows* OS API. You can create more complex synchronization constructs from a group of synchronization objects.

API Usage Tips

Follow these guidelines to properly insert synchronization APIs within your code:
  • Insert an
    acquired
    API immediately
    after
    your code stops waiting for a synchronization object.
  • Insert a
    releasing
    API immediately
    before
    the code signals that it no longer holds a synchronization object.
If you place the synchronization APIs improperly, the
Intel Inspector
may report threading problems where there are none or fail to detect real threading problems.

Usage Example: User-Defined Synchronized Critical Section

The following code snippets show how to create a critical section construct that can be tracked with synchronization APIs:
C/C++ Example
Fortran Example
#include <ittnotify.h> CSEnter(MyCriticalSection * cs) { while(cs->LockIsUsed) { if(cs->LockIsFree) { // Code to acquire the lock goes here __itt_sync_acquired((void *) cs); } } } CSLeave(MyCriticalSection *cs) { if(cs->LockIsMine) { __itt_sync_releasing((void *) cs); // Code to release the lock goes here } }
use ittnotify subroutine CSEnter(cs) integer cs while(LockIsUsed(cs) .ne. 1) if(LockIsFree(cs) .eq. 1) ! Code to acquire the lock goes here call itt_sync_acquired(LOC(cs)) end if enddo end subroutine subroutine CSLeave(integer cs) { integer cs if(LockIsMine(cs) .eq. 1) call itt_sync_releasing(LOC(cs)); ! Code to release the lock goes here end if end subroutine
Note the following when looking at this simple critical section example:
  • The
    acquired
    API is placed immediately after the code obtains the user lock.
  • The
    releasing
    API is placed before the code releases the user lock. This ensures another thread does not call the
    acquired
    API before the
    Intel Inspector
    realizes this thread has released the lock.

Usage Example: User-Level Synchronized Barrier

Higher-level constructs, such as barriers, are also easy to model using synchronization APIs. The following code snippets show how to create a barrier construct that can be tracked using synchronization APIs:
C/C++ Example
Fortran Example
#include <ittnotify.h> Barrier() { teamflag = false; __itt_sync_releasing((void *) &counter); InterlockedIncrement(&counter); //use the atomic increment primitive appropriate to your OS and compiler if( counter == thread_count ) { __itt_sync_acquired((void *) &counter); __itt_sync_releasing((void *) &teamflag); counter = 0; teamflag = true; } else { Wait for team flag __ itt_sync_acquired((void *) &teamflag); } }
use ittnotify subroutine barrier() common /x/ teamflag, counter, thread_count integer teamflag integer thread_count integer counter teamflag = 0 call itt_sync_releasing(LOC(counter)) !atomically update counter here !use the atomic increment primitive !appropriate to your OS and compiler If ( counter .eq. thread_count ) then call itt_sync_acquired(LOC(counter)) call itt_sync_releasing(LOC(teamflag)) counter = 0 teamflag = 1 else !Wait for team flag call itt_sync_acquired(LOC(teamflag)) end if end subroutine
Note the following when looking at this example:
  • There are two synchronization objects in this barrier code. The
    counter
    object is used to do a gather-like signaling from all the threads to the final thread indicating that each thread has entered the barrier. Once the last thread hits the barrier, it uses the
    teamflag
    object to signal all the other threads that they may proceed.
  • As each thread enters the barrier, it calls the
    releasing
    API to tell the
    Intel Inspector
    it is about to signal the last thread by incrementing
    counter
    .
  • The last thread to enter the barrier calls the
    acquired
    API to tell the
    Intel Inspector
    it was successfully signaled by all the other threads.
  • The last thread to enter the barrier then calls the
    releasing
    API to tell the
    Intel Inspector
    it is going to signal the barrier completion to all the other threads by setting
    teamflag
    .
  • Finally, before leaving the barrier, each thread calls the
    acquired
    API to tell the
    Intel Inspector
    it successfully received the end-of-barrier signal.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804