vectorization intensity

vectorization intensity

Hi, I am measuring the vectorization intensity (VPU_ELEMENTS_ACTIVE/VPU_INSTRUCTIONS_EXECUTED) on a kernel like matrix addition. But I found that the VI is around 8.7 on a single core of Xeon Phi 5110P (with double-precision data elements). But it is impossible to achieve a VI that is larger than 8, right? Does anybody have an explanation?


4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

VPU_ELEMENTS_ACTIVE simply counts the number of vector operations. It does not differentiate between operations based on the data type of the operands. I believe that even though your code predominantly uses double precision data elements, there could be some more vector computations on operands which are not double precision. This could result in a vector intensity greater than 8. 

Just to chime with a similar observation: for an inner loop comprised entirely of 4-active-lane fp32 vectors, VPU_ELEMENTS_ACTIVE reported above 8 on my first MIC VTune session today. So I am still confused about the semantics behaind this counter. Perhaps a small code primer with related VPU_ELEMENTS_ACTIVE comments on the side could clear up most questions on this subject?

I suspect that even "inactive" lanes are counted, possibly including all the lanes even though several are masked off.  As pointed out earlier, there are double precision operations such as divide and sqrt which expand out to code requiring initial step using 16 wide approximation.  A high value is still a good sign that certain kinds of serial instructions don't dominate.  I'm not ready to be depressed when I don't see a satisfactory number or overly impressed by a high one.

Login to leave a comment.