VTune is unusable in lock and waits

VTune is unusable in lock and waits

I have program, where there are as many threads as CPU count (logical) +2. These threads are blocked by waiting on semaphore to run, they are like agents. But lock and waits analysis counts time spent on non-signalled semaphore too (which is fairly long time), but I'd like to analyse time spent on mutexes or critical sections locks inside those agents when they are running (has been triggered). What should I do?

-- With best regards, VooDooMan - If you find my post helpful, please rate it and/or select it as a best answer where applies. Thank you.
4 post / 0 nuovi
Ultimo contenuto
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione
Ritratto di Peter Wang (Intel)

> But lock and waits analysis counts time spent on non-signalled semaphore too (which is fairly long time), but I'd like to analyse time spent on mutexes or critical sections locks...

I am not sure if I understand your meaning of "non-signaled" semaphore exactly, if one thread waits for signal from other thread, most of time it could be in sleep state, if you need to know CPU time on code when signal triggered - you have to change viewpoint to "Hotspots by Thread Concurrency", it shows what code executed after signaled semaphore arrived.  

Based on your description, I wrote simplest example code of using semaphore: main thread creates two child threads, one is to send a request, another is to process the request. 

Locksandwaits analyzed result perfectly, I used 360 seconds as duration, one request per 10 seconds, totally 36 requests, regarding workload is tiny, hotspots was almost zero during 36 times of wake-up - if you change viewpoint to Hotspots by Thread Concurrency in L&W report.  

> gcc -g sem.c -o sem -lpthread

>amplxe-cl -collect locksandwaits -duration 360 -- ./sem&

The result indicated, wait count was 36 for semaphore, each wake-up did tiny works, so semaphore wait time was close to elapsed time 

Allegati: 

AllegatoDimensione
Download sem.c617 byte
Download sem_locks.png65.26 KB

Ah, now I found out there is "concurrency" analysis and I see "bottom-up" and "function/call stack" list of functions. The most time was spent by function WaitForSingleObjectEx (it is in the first line of table), which I am using only for semaphore waiting/releasing (it is sometimes called "semaphore was signalled" - woken up from wait/blocked state/it was released).

And I can see all other routines (with call stack) so I am now able to analyse in-depth things.

Thank you.

-- With best regards, VooDooMan - If you find my post helpful, please rate it and/or select it as a best answer where applies. Thank you.
Ritratto di Peter Wang (Intel)

Great to know that you have find CPU spent time on specific function...If you selected "User/System function" for call stack mode, then WaitForSingleObjectEx (which is system function) should be found. 

Accedere per lasciare un commento.