Can Advisor help me thread my code… even if I use Templates?

Often I find that template programming in C++ can help me create concise and extensible code quickly and easily. However, sometimes using templates restricts my ability to optimize for performance and may create debugging issues that seem much more complicated than non-templatized code. Intel® Parallel Advisor (Advisor) can help you reclaim these missed performance opportunities by helping you add threads to your application, even if it uses templates.

Advisor is designed to give you detailed, yet easily understandable profile information about your serial application. Take a look at the Survey Report (Figure 1) that I generated with Advisor on my application which uses templates.


Figure 1

I can see in figure 1 that my application is spending most of its time in the template function “fillVals”, more specifically the instantiation of “fillVals” with the template parameter “r_and_iN”. This instantiation takes up 91% of my runtime, while the instantiation with template parameter “sums” is only 4.5%.

From here I could go two ways, I could specialize the fillVals function on “r_and_iN” and try to optimize the specialization, or I could try and optimize the generic fillVals function. I will first try optimizing the generic function because this could boost the performance in both instantiations, and in future uses with different template parameters.

After I add my Advisor annotations to the generic template function, I run the Suitability and Correctness tools. The Correctness Report (Figure 2) is telling me that I’ve got issues.


Figure 2

Next, I navigate to the Correctness Source view (Figure 3) for one of these issues. Looking at the call stacks on the right-hand side, I can see that the issue occurs in the instantiation of fillVals which has “sums” as the template parameter. The problem is that calls to setVal(idx) from each task can reference both m_arr[idx] and m_arr[idx-1]. The second issue reports the same thing.


Figure 3

These correctness issues tell me that it may be tough to parallelize the generic version of fillVals. However, remember that I had two choices after I ran the Survey tool and now I can pursue the second, optimize a specialized version of the template function. I add my annotations to the specialized template function (Figure 4) and rerun the Advisor Suitability and Correctness tools.


Figure 4

The correctness issues are gone because setVal for r_and_iN only accesses a single element. The Suitability tool, which estimates parallel performance, is predicting a 2.5X speedup. These results make me confident that adding real threads in place of the annotations will provide performance gains that are well worth the effort.

C++ Templates add a new level of complexity to applications, but you shouldn’t let this deter you from optimizing and parallelizing. Regardless of whether your code uses templates or not, Intel Parallel Advisor can help you find the best ways to thread your code and boost your performance.

You can find more information about Advisor here, and talk with other users in our forum here.

For more complete information about compiler optimizations, see our Optimization Notice.