I am using the VTune Aplifier XE for Windows in order to support the parallelization of a given program.VTune is a good help in showing me the hotspots of the program, however I am curious how it can help me to measure the improvement after parallelization.For example: I have function A which is identified as a hotspot. After parallelization it becomes executed concurrently on multiple processors which speeds everything up. What the analysis of VTune then shows me is the CPU Time over all busy processors which is more or less the same as in the sequential case - this is not a surprise as the actual work was not reduced by parallelization.I guess measuring the (inclusive/exclusive) time of a given function is just not possible with sampling... am I right here?One more thing: in your VTune tutorial (https://wiki.engr.illinois.edu/download/attachments/114688007/amplifier_xe_linux.pdf?version=1&modificationDate=1296056455000) on page 27 the author mentions two options how the code can be improved:* sequential tuning* parallelizationand in the tutorial they choose the first option. This leaves the impresssion that you could have also used VTune to support the second option, wich seems to be not true as I have described above.Or did I miss something and you can use VTune to measure the speedup of a funtion after parallelization?Constantin
For more complete information about compiler optimizations, see our Optimization Notice.