Help with undestanding the Vtune results.

Help with undestanding the Vtune results.

I ran a lightwieight hotspot analysis on my code. I get the result attached as csv file. Can you please help me with pointers to what i can do now to improve the speed of the program. Major portion of the time is spent in zgemm3m for amtrix multiplications and matrix inverse using zgesv (or getrf and getri ). I am not able to understand the timing information obtained.

My computer has dual quad core(E5240) 2.493 GHz

Downloadtext/csv LightweightHotspot_RunG.csv44.66 KB
4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Thanks for your results of lightweight-hotspots. Usually you can identify performance issue based onCPI value on tophot functions, the smaller the better. Howeversome function which used SSE3/SSE4//AVX instructions, willhas big CPI value - itis reasonable(single instruction, multiple data)

So you may investigate source line - which caused highCPI value (small instruction retired, big CPU cycles spent). For MKL functions, they are well performance tuned functions...You only need toensure if you used them in right usage mode.

You mayuse Concurrency Analysis to know parallelsimof your program, work balance onthreads, cores' utilization, etc.

You may use LocksAndWaits Analysis to know wait time, which may cause stalls between threads.

Regards, Peter

"..I am not able to understand the timing information obtained." - Thetime is shown onreport, was calculated by using this formula:
"CPU Unhalted Cycles" Event / CPU Frequency

Overhead of profiling timewas not considered, I guess.

Thanks Peter,
I think i need to take a relook at the algorithm i am using. As you sugested, i will do the other analysis and see if there are any issues that may be bottlenecks.


Leave a Comment

Please sign in to add a comment. Not a member? Join today