Intel® Advisor XE along with the other Intel® Parallel Studio XE tools lay out a multi-step process to aid developers in transitioning their serial code to efficient and correct parallel code. This blog will focus on the first step of the process: How to determine where to add parallelism in an application.
Intel Advisor XE provides an easy to use GUI interface on Windows* and Linux* as well as a plug-in to Microsoft Visual Studio*. The first step in using Intel Advisor XE is to run the Survey tool, which will help determine where most of the time in the application is being spent. These “Hotspots” are good starting points when deciding where to add parallelism in an application. Figure 1 shows a screenshot of some profile data generated by Intel Advisor XE on a K-Nearest Neighbors application.
Figure 1 shows that the majority of the program is spent in STL copy and Tree functions, 61.9% and 18.5% respectively. It also spends 9.2% in a loop in the method called KNN::distance. The distance function is the only one that I have control over, so I can start my investigation there. With one mouse click, Intel Advisor XE will automatically navigate to the source code (definition) of this method. Figure 2 shows the breakdown of time spent in KNN::distance.
Looking at this method, it can be seen that a single call should not take very long. It only calculates the distance between two points and it spends quite a bit of time returning the value in addition to the calculation.
If this was all the information that Intel Advisor XE presented, it may be difficult to continue parallelizing because the bulk of the work doesn’t appear to be very conducive to parallelism. However, the Survey Report provides much more detailed information.
Figure 3 revels that the KNN::predict method is making calls to distance and the Survey Report also shows that these calls are made in the body of a loop (see the highlighted "loop" line).
This loop identifies another possible site for parallelism that still focuses on the Hotspots of the code. This site appears to be a loop with independent iterations (the execution of one iteration isn’t dependent on a previous iteration), which is a prime candidate for parallelization.
The way Intel Advisor XE encourages the use of profile information can be very useful in uncovering the not-so-obvious locations where introducing parallelism may greatly improve performance. The Survey Report can present call path information that may reveal better locations to introduce parallelism while still focusing on the Hotspots. The timing breakdowns also reveal interesting characteristics about applications that may not be obvious, such as the heavy overhead of vector operations.
After locating ideal sites for parallelism, the next steps in the Intel Advisor XE Workflow are to insert Intel Advisor XE Annotations to gather Suitability and Correctness data about the proposed parallelism to determine the easiest and most efficient way to parallelize an application.